It is not only the setting of assessments which affects their validity and reliability, but also how they are marked (or graded). The options which are available to you for marking need to be taken into account at the point of design.
In some cases, such as problem sheets, the design of the assessment takes longer than the marking, and usually the scheme is fairly self-evident. The learning being tested is usually convergent, which means that correct answers are clear, and the only real problems concern half-correct answers: if someone has got the answer to a maths problem wrong, do you give credit for the fact that they only went wrong in the latter stages of the working?
Whatever the decision, it is fairly easy to be consistent and hence reliable in its application.
This is less true in the case of essay-type questions. In fact, one of their problems is that they are so easy to set—most experienced teachers can think of an essay question off-the-cuff in fifteen seconds—that we often have little clear idea of what we will get back.
In the case of basic-level work, it is possible to determine a marking scheme which gives a set number of marks for mentioning particular issues:
Outline the longer-term consequences of the Schleswig-Holstein question. (5 marks)
gives a marking scheme of:
Lord Palmerston commented (1 mark)
that only three people ever understood the question (1 mark)
and of them:
- one was dead
- one was mad
- and he himself had forgotten (3 marks)
This approach is used to maintain consistency in the marking of large-scale examinations where a number of markers are used (e.g. GCSEs and AS and “A” levels in the UK), but even there it may be supplemented by marks awarded for more global factors, such as clarity of expression.
The temptation when marking substantial numbers of essays is to rush to a global mark, which takes into account a large number of factors, and facilitates comparison among members of the student group, but is probably highly unreliable, even when accompanied by a few remarks scrawled in the margin and at the end of the submission. Such a mark is often based on the teacher’s conviction that, “I may not be able to describe a 57% (or a C+) essay, but I know one when I see one”. Unfortunately (?), this is not good enough. For one thing, consider how many times you have marked a run of half-a-dozen disappointing essays, and then come across a moderately good one, to which you have given a higher mark than it deserves, out of sheer relief!
One way forward
As ever, the alternative is to ask—when setting the assignment in the first place—”Just what do I want the students to demonstrate?”
You may decide that there are five major factors, such as (just for the sake of this argument—they will not apply to every subject):
Demonstration of knowledge of the content of the module.
Ability to bring critical understanding to bear on the material: not accepting everything at face-value: exercise of reasonable judgement about what is important and what not.
Use of Sources
Evidence of reading, both from the set texts and beyond them, and appropriate appeal to authorities to support and refute arguments.
The overall construction of the argument of the essay, including the drawing of relevant conclusions
Structure and expression
The essay as a piece of writing: its flow, style, and grammatical construction
Try to make each of these factors as much as possible independent of each other (which is more difficult than it seems, as this less-than-perfect example shows)
- It is good practice to have a mark-sheet which uses such standard headings, and can then be used for feedback to the students. With large numbers, this may have to be automated (see below): with smaller numbers, you can comment individually.
Next, think about the various levels at which each of these may be demonstrated on, say, a five-point scale, where “1” is low and “5” is high.
- You could of course go from “0” to “4”, which is probably more “accurate”, but this could result in an overall mark of “0”, and although this may reflect your feelings about a particular piece of work, the convention is in practice that students get at least 15% for simply trying!
- Mark the essays, giving them an appropriate mark (1-5) on each factor.
- Construct the overall numerical mark, probably out of 100, by weighting each of the factors according to their perceived importance in relation to the task as a whole. Thus “Structure” may be much less relevant as a criterion of assessment than “Knowledge”: so Structure is incorporated x2 (a “4” for Structure = 8% of the final mark), but Knowledge is weighted x5 (a “4” for Knowledge = 20% of the final mark). Once the final mark has been calculated, this can if necessary be translated back into a nominal grade (A, B, C, etc.)
It is of course possible to develop a little mail-merge macro to do this for you, and to generate a useful feed-back sheet by automatically inserting the content of each of the above cells according to the score on the five-point scale.
The net result is that if you are ever asked to justify your marking, you will be able to do so without trouble and, much more important, the students are getting feedback and grading which is as reliable as you can make it given the inherent subjectivity of the task. More to the point, if there are several of you assessing a large cohort, agreeing such a scheme in advance adds immeasurably to the consistency of the marking.
Under Creative Commons License: Attribution Non-Commercial No Derivatives