A question: How should you score the pentathlon? What is the ‘correct’ way to add times and distances?
Deciding on the assessment techniques for your module or course should be the second activity in the constructive alignment process, after defining the intended learning outcomes and before considering teaching and learning approaches. There are of course many ways of assessing students – certainly many more than the closed-book end-of-module examination and the fortnightly piece of ‘coursework’. The following list is an aide memoire.
- Closed-book examination;
- Open-book examination;
- On-line test, involving some or all of:
- Multiple-choice questions (MCQ) with single or multiple answers;
- Word-completion exercises;
- Numerical questions;
- Randomised questions;
- Clicking on images;
- Selection of a few questions from a larger bank;
- You could provide hints and feedback;
- Oral presentation with or without questions;
- Oral examination on a predetermined topic;
- Oral examination on open topics;
- Written report (with or without a pro-forma);
- Designs or manufactured artefacts;
- Poster or e-poster;
- Assignment involving numerical or essay questions;
- A portfolio of work, or an e-portfolio.
- A wiki
When devising assessment tasks you should bear in mind a few questions:
- Is this assessment formative (they learn from it) or summative (they get a mark for it)? Could it be both? Even if it is summative, should the students have their script or other work returned, so they can learn from it or even cherish it? I would argue that even the final undergraduate exams are merely the start of life-long-learning, so need to be formative;
- How are you going to give feedback to the students? Can you do this individually or must it be generic? How long after the hand-in date can you provide feedback? [see below];
- Do you require the same ‘pass mark‘ for all material, or are there some elements which it is essential for the student to know or understand? You might assess ‘core material’ differently (e.g. with a local 100% pass mark), either within a single examination or separately;
- Is your motive in setting this assessment to allow the students to demonstrate competence (without which they should probably not be allowed to practice), or to differentiate the smart from the average, so that you can label them differently on graduation (first vs lower seconds)? Or to give credit for further non-directed study?;
- Could you (while maintaining fairness) use a multiple-choice (MCQ) or multiple-response format which could be set in e-form and would not need to be re-invented next year?;
- Could you use questions from a question bank, either of your own devising or offered from elsewhere? There are several large question banks out there, at varying stages of development. You might like to have a look at the Learning Catalytics site and at the collections being developed by the AIChE and CDIO;
- Have you considered getting the students to write their own assessment questions and tasks? This is a very good learning experience for most students;
- Are you testing knowledge or understanding (or other levels of the Bloom taxonomy – see Chapter 2.3). If you are not simply testing recalled knowledge, could the students reasonably have books and notes available? Do they need a hard time limit?;
- Could some of the assessed items be gateways to further study or achievement? In other words they could carry zero ‘marks‘ but a pass could be required before further material or summative assessments are released.
You might wonder why, at least in the UK, we expect students to write by hand in most exams whereas we demand word-processed text on almost every other occasion. It should be no surprise that it is often difficult to read exam scripts.
Multiple choice questions (MCQs) are of course very attractive in principle because they eliminate much of the chore of marking and are ideally suited to delivery on-line. However before you rush off and re-write your exam in this form, it is worth considering some of the pros and cons. Good MCQs, which discriminate between students who understand and those who do not, are not easy to write and will take time. But of course you can use them again. Before you start, find out whether your institution offers on-line testing and/or paper-based forms which can be machine-read. Increasingly you should have available a system which enables you to:
- Ask a variety of question types;
- Expect single or multiple correct answers;
- Allocate different marks to different questions (so that you can mix short and long tasks);
- Analyse the distribution of answers to each question.
If you have all this available then try to write your questions so that:
- There are no ‘silly’ answers which can be eliminated as obviously wrong. This is actually rather difficult and you will spend most of your time composing plausible wrong answers, both to qualitative and numerical questions;
- You learn something about the misconceptions of the students, to inform the development of your class next year;
- You are able to differentiate between understanding and recall.
Because the MCQs are computer-marked you will usually have better data about the range of student performance than for a conventional exam. This enables you to weed out those questions which either all or none of the students get right, and to adjust your teaching in future. If you are seeking only to differentiate the students (and are willing to forgo the formative aspect of the test) then you can probably find just a few questions which regularly divide the students into those who get it and those who don’t. Being cynical, at one level, that is all you need and it would be provided by a very short exam.
A radical but very stimulating method of oral assessment has been practiced for more than 20 years in the discipline of Electrical Engineering in the Faroe Islands [Jensen, 2010]. Students are given 60 minutes to prepare a presentation on a whiteboard and twenty minutes to explain and defend it to two members of staff. A list of twenty or more potential topics, spanning the syllabus, is published at the start of the module, and the students do not know which of these they will have to present until they enter the examination room. This style encourages the students to prepare across the whole syllabus, and allows the examiners to explore understanding as well as recall. It works well for modest-sized classes (at twenty minutes per student you can assess twenty or so students per day) and there is almost no time required to devise the assessment, so this is the total time commitment for assessment.
Life is an open-book exam.
Finally – a comment about the re-scaling or other adjustment of the marks from any assessment instrument. It is quite common for examiners to re-scale sets of marks, usually when the mean or distribution of the marks seems to be much higher or lower than expected. There are many ways of doing this, despite a dearth of papers describing them, but the more important question to address is why you might be doing it. The only serious justification I have heard is the pragmatic ‘surely we are not going to fail this large fraction of this cohort’ (in response to a particularly low set of marks). This is of course no (educational) justification at all, although it might be a realistic response to financial pressures!
So where scaling is used, is it a short-term fix to ameliorate the effect of a poorly-devised exam or a badly-taught module, or is it part of a strategy to avoid the distortion of a student’s average grades by a small number of ‘anomalously’ high or low marks? I can find no serious, thoughtful, writing on this subject but I have been told by respected academic friends that some cohorts of students, of equivalent entry standard, although apparently taught the same material in the same way and having been set an exam closely similar to previous years, nevertheless deliver radically different sets of marks. My own thoughts on this behaviour are:
If the cohort was of very different ability or put in a very different amount of effort, then their ‘anomalous’ marks should surely stand.
But perhaps the assessment instrument (e.g. exam) has a very large random noise element and an error bar of perhaps +/- 10 percentage points – in which case surely we should work to improve the assessment instrument and/or average a lot of such sets of ‘uncorrected’ marks.
In neither case is re-scaling justified, in my opinion. A potential explanation for the anomalous behaviour might arise from the group dynamics of a class of students. It is often reported (anecdotally – I have not seen the hard evidence, but I don’t read much sociology) that the behaviour of a whole group can be influenced to a significant extent by a few opinion leaders. It might become the accepted wisdom among a particular year group that a particular module is ‘difficult’ or ‘not worth the effort’. It might also happen that a key threshold concept (see Chapter 2.4) is not mastered by the group leaders and thus is not effectively transmitted around the class. These might be interesting issues for future educational research projects, but it is difficult to see why one should manipulate exam marks to deal with them.
Read on … (but first please add a comment)