5.13 Assessment of Learning

A question: How should you score the pentathlon? What is the ‘correct’ way to add times and distances?

Deciding on the assessment techniques for your module or course should be the second activity in the constructive alignment process, after defining the intended learning outcomes and before considering teaching and learning approaches. There are of course many ways of assessing students – certainly many more than the closed-book end-of-module examination and the fortnightly piece of ‘coursework’. The following list is an aide memoire.

  1. Closed-book examination;
  2. Open-book examination;
  3. On-line test, involving some or all of:
  4. Multiple-choice questions (MCQ) with single or multiple answers;
  5. Word-completion exercises;
  6. Numerical questions;
  7. Randomised questions;
  8. Clicking on images;
  9. Selection of a few questions from a larger bank;
  10. You could provide hints and feedback;
  11. Oral presentation with or without questions;
  12. Oral examination on a predetermined topic;
  13. Oral examination on open topics;
  14. Written report (with or without a pro-forma);
  15. Designs or manufactured artefacts;
  16. Poster or e-poster;
  17. Assignment involving numerical or essay questions;
  18. A portfolio of work, or an e-portfolio.
  19. A wiki

When devising assessment tasks you should bear in mind a few questions:

  • Is this assessment formative (they learn from it) or summative (they get a mark for it)?  Could it be both?  Even if it is summative, should the students have their script or other work returned, so they can learn from it or even cherish it? I would argue that even the final undergraduate exams are merely the start of life-long-learning, so need to be formative;
  • How are you going to give feedback to the students?  Can you do this individually or must it be generic? How long after the hand-in date can you provide feedback? [see below];
  • Do you require the same ‘pass mark‘ for all material, or are there some elements which it is essential for the student to know or understand?  You might assess ‘core material’ differently (e.g. with a local 100% pass mark), either within a single examination or separately;
  • Is your motive in setting this assessment to allow the students to demonstrate competence (without which they should probably not be allowed to practice), or to differentiate the smart from the average, so that you can label them differently on graduation (first vs lower seconds)? Or to give credit for further non-directed study?;
  • Could you (while maintaining fairness) use a multiple-choice (MCQ) or multiple-response format which could be set in e-form and would not need to be re-invented next year?;
  • Could you use questions from a question bank, either of your own devising or offered from elsewhere? There are several large question banks out there, at varying stages of development.  You might like to have a look at the Learning Catalytics site and at the collections being developed by the AIChE and CDIO;
  • Have you considered getting the students to write their own assessment questions and tasks? This is a very good learning experience for most students;
  • Are you testing knowledge or understanding (or other levels of the Bloom taxonomy – see Chapter 2.3). If you are not simply testing recalled knowledge, could the students reasonably have books and notes available?  Do they need a hard time limit?;
  • Could some of the assessed items be gateways to further study or achievement? In other words they could carry zero ‘marks‘ but a pass could be required before further material or summative assessments are released.

You might wonder why, at least in the UK, we expect students to write by hand in most exams whereas we demand word-processed text on almost every other occasion.  It should be no surprise that it is often difficult to read exam scripts.

Multiple choice questions (MCQs) are of course very attractive in principle because they eliminate much of the chore of marking and are ideally suited to delivery on-line. However before you rush off and re-write your exam in this form, it is worth considering some of the pros and cons. Good MCQs, which discriminate between students who understand and those who do not, are not easy to write and will take time. But of course you can use them again. Before you start, find out whether your institution offers on-line testing and/or paper-based forms which can be machine-read. Increasingly you should have available a system which enables you to:

  • Ask a variety of question types;
  • Expect single or multiple correct answers;
  • Allocate different marks to different questions (so that you can mix short and long tasks);
  • Analyse the distribution of answers to each question.

If you have all this available then try to write your questions so that:

  • There are no ‘silly’ answers which can be eliminated as obviously wrong. This is actually rather difficult and you will spend most of your time composing plausible wrong answers, both to qualitative and numerical questions;
  • You learn something about the misconceptions of the students, to inform the development of your class next year;
  • You are able to differentiate between understanding and recall.

Because the MCQs are computer-marked you will usually have better data about the range of student performance than for a conventional exam. This enables you to weed out those questions which either all or none of the students get right, and to adjust your teaching in future. If you are seeking only to differentiate the students (and are willing to forgo the formative aspect of the test) then you can probably find just a few questions which regularly divide the students into those who get it and those who don’t. Being cynical, at one level, that is all you need and it would be provided by a very short exam.

A radical but very stimulating method of oral assessment has been practiced for more than 20 years in the discipline of Electrical Engineering in the Faroe Islands [Jensen, 2010]. Students are given 60 minutes to prepare a presentation on a whiteboard and twenty minutes to explain and defend it to two members of staff. A list of twenty or more potential topics, spanning the syllabus, is published at the start of the module, and the students do not know which of these they will have to present until they enter the examination room. This style encourages the students to prepare across the whole syllabus, and allows the examiners to explore understanding as well as recall. It works well for modest-sized classes (at twenty minutes per student you can assess twenty or so students per day) and there is almost no time required to devise the assessment, so this is the total time commitment for assessment.

Life is an open-book exam.

Finally – a comment about the re-scaling or other adjustment of the marks from any assessment instrument. It is quite common for examiners to re-scale sets of marks, usually when the mean or distribution of the marks seems to be much higher or lower than expected. There are many ways of doing this, despite a dearth of papers describing them, but the more important question to address is why you might be doing it. The only serious justification I have heard is the pragmatic ‘surely we are not going to fail this large fraction of this cohort’ (in response to a particularly low set of marks). This is of course no (educational) justification at all, although it might be a realistic response to financial pressures!

So where scaling is used, is it a short-term fix to ameliorate the effect of a poorly-devised exam or a badly-taught module, or is it part of a strategy to avoid the distortion of a student’s average grades by a small number of ‘anomalously’ high or low marks?  I can find no serious, thoughtful, writing on this subject but I have been told by respected academic friends that some cohorts of students, of equivalent entry standard, although apparently taught the same material in the same way and having been set an exam closely similar to previous years, nevertheless deliver radically different sets of marks. My own thoughts on this behaviour are:

If the cohort was of very different ability or put in a very different amount of effort, then their ‘anomalous’ marks should surely stand.

But perhaps the assessment instrument (e.g. exam) has a very large random noise element and an error bar of perhaps +/- 10 percentage points – in which case surely we should work to improve the assessment instrument and/or average a lot of such sets of ‘uncorrected’ marks.

In neither case is re-scaling justified, in my opinion. A potential explanation for the anomalous behaviour might arise from the group dynamics of a class of students. It is often reported (anecdotally – I have not seen the hard evidence, but I don’t read much sociology) that the behaviour of a whole group can be influenced to a significant extent by a few opinion leaders. It might become the accepted wisdom among a particular year group that a particular module is ‘difficult’ or ‘not worth the effort’. It might also happen that a key threshold concept (see Chapter 2.4) is not mastered by the group leaders and thus is not effectively transmitted around the class. These might be interesting issues for future educational research projects, but it is difficult to see why one should manipulate exam marks to deal with them.

Read on …  (but first please add a comment)

4 All Responses to “5.13 Assessment of Learning”

  1. Oliver Broadbent

    I am working with my colleagues at Think Up on a new multiple choice quiz designed to help civil engineers develop civil engineering knowledge, as a precursor to developing more sophisticated skills and understanding. The game is called Engineering Mastermind and it will be hosted on the Expedition Workshed website (http://expeditionworkshed.org). Our thinking is that students may not be aware of the sorts of things it would be useful to know; and it is not appropriate to use class time to list this information. We expect that Engineering Mastermind will be used in two ways. The first is that teaching staff will point their students towards specific topics in order to prepare them for lectures on that subject. The second is that students will see that students will see that their classmates are using the game, and will want to give it a go. A key element is that when students successfully complete a round they win a badge which they can post to Facebook. There will also be a university leader board to build a sense of competition.

    We tested an early stage concept with students at the University of Edinburgh earlier this week, and they responded well to the questions. At the outset I had been worried about how students might cheat on the quiz, but I since realise that it doesn’t matter. The very fact that students see a series of multiple choice quiz questions that repeatedly refer to a set of objects or ideas that they should be aware of, will increase their awareness.

    Testing begins in January, and we look forward to seeing the impact of its use.

  2. Peter Goodhew

    I like your comment about “cheating”. The only problem that I can see with your understandably relaxed attitude is that cheating at the level of copying bypasses the thought process and therefore only achieves familiarity (the lowest level of achievement) rather than understanding.

  3. martingillie

    Hello, Thank you for a very nice book and interesting form of publication. I found it after discussions with Oliver Broadbent who has been working with me at the University of Edinburgh as an RAENG VIsiting Teaching Fellow on the structural engineering design teaching we do. (some details on my blog)

    I have lots of comments I would like to make on many of your chapters, all of which are thought provoking but will start with the topic of scaling marks. As university exam boards become increasingly formulaic with rigid sets of rules to follow, scaling is one of the few areas where marks can be adjusted by collective academic judgement, rather than an individual marker’s judgement. This is I think a sound educational reason for using scaling While indiviudal markers may be the only ones with the technical knowledge to mark in a particular area, different markers may well have varying expections of the type of work that is worth 70% (or whatever), a collective judgement is more consistent. This sort of consistency is particularly important when students on a degree programme can take optional courses. If expectations are different between courses some students may be advantaged and others disadvanted.

    Separately you note that lack of literature on scaling. One artilce I am aware of is the following. The system described was used for a while in Civil Engineering at the University of Edinburgh but has since been abandoned for a simpler (and less rigourous) approach.

  4. Peter Goodhew

    There are some efforts to develop question banks for use both summatively and formatively. One such is the AIChE Concept Warehouse at http://cw.edudiv.org


Leave a Comment