Knowing what they know

Today I watched David Weston’s rather good TedX talk on developing teachers. It very clearly shines a light on the huge deficit in teacher development and is well worth watching if you have time.

But enough about him, I want to talk about the one thing he mentioned that links in to this tool. That thing … diagnostics.

One of the big advantages that experienced teachers have is their seemingly innate sense of what pupils know. It’s not innate though. New teachers really struggle to pitch learning correctly and often this undermines their efforts. After a few years of marking and evaluating their practice they develop a much more finely tuned sense of what pupils – generally – will know and what they will struggle with.

High quality diagnostics could transform this situation.

Firstly, if new teachers have this information live in lessons, they will learn much more quickly where their pupils are at and be able to adapt their teaching accordingly.

Secondly, experienced teachers with access to much more specific data about pupil knowledge will maximise challenge and progress.

Of course, there are methods in use already – formative assessment such as mini whiteboards – that sort of do this job. The technology exists to do much better – ask better questions, collect data more systematically.

In summary, I think I better crack on and build the reporting part of my tool.

P.S. The pilot is now open. If you want to have a try start here.

Making formative assessment easy(er)

Quizzing is a great way of doing formative assessment. However, if you have ever tried to write a multiple-choice quiz, you will know it’s hard.

It’s really hard.

It looks easy but you soon learn the road is paved with pitfalls and bear traps.


Because of guessing.

Guessing is a massive confounding problem. To counteract it you have to:

  • Devise several plausible distracters
  • Avoid intuitive facts and things that pupils have a modest idea of
  • Ask lots of questions

Many question writers find themselves deliberately misleading as they try and find a non-obvious way of asking about a concept. This is both hard work and leads to invalid questions.

And it’s not just MCQ: I’ve been reading up on Dylan Wiliam and the idea of hinge questions*. Exactly the same problem exists for these quick checks as for a longer quiz – coming up with a question is surprisingly hard work. Why? Because it must be challenging to be useful and many questions (in the context of a lesson on the same subject) will be too easily answered through guesswork.

Certainty-marked assessments correct against guesswork. One of the biggest advantages this brings is that it is easier to write questions. Even a lone true-false question (the easiest MCQ to write but highly guessable) gains a diagnostic significance if it can score between -6 and 3 marks. This means there will be many more questions you can ask that have the diagnostic power you need to inform teaching and learning.

Using a certainty-scoring system makes questions harder in a valid way and without additional teacher effort. This is exactly what edtech should achieve: Teachers’ practice is more effective and their job is made easier by technology.

*So much so that there’s probably a whole post to write about how certainty-marked assessments – and the data they produce – would work for this purpose.




I’ve bitten the bullet and invited my first users into the system. There’s lots more to do, but the core is in place and I could hold back forever if I kept waiting until the next task was complete.

If you would like to join the pioneers, then get in touch using this form. It’s really early days for the system and you should be aware that things will change (for the better) as I get around to implementing and refining the features. I’ll be sharing a roadmap in due course too, so that you have an idea of what is coming.

Growth, mastery and other trends

Ten years ago Assessment for Learning hit a peak. I’m speaking here in terms of trendiness rather than anything substantial like quality or quantity of the practice. Essentially I’m talking about how far up on an education consultant’s credentials they would place it as a speciality.

Nowadays, the two chart-toppers are mastery-learning and growth-mindset. As with (most of) the trends that have come before there is a reasonably good idea lying behind them. At the same time there will be a spectrum of implementable practices advocated by advisers, consultants, inset providers and edu-publishers.

Bearing this in mind, I’m cautious about contriving tenuous links between my product and these trends. On the other hand, I keep seeing these links – when someone describes what they are trying to achieve and I think there’s a great solution.

I’ll be straight, the strong evidence for certainty-based questioning is for its accuracy and reliability, not for its pedagogical impact. Everything else is conjecture (as far as I am aware), based on my feeling that this diagnostic assessment could be really powerful. Please bear this in mind as I move on to describe how my product fits with these recent trends…

Mastery learning

Mastery learning is the idea that learning should be organised in steps and earlier steps should be mastered before progression to the next. Mastery is achieved though practise, feedback and corrective procedures.

Clearly, to take this approach, there must be a way of assessing levels of mastery. From what I’ve seen, this usually consists of finding that a pupil can consistently  (3 times in a row?) demonstrate the skill at the required level.

I believe ‘personal certainty’ (when stated honestly) is a better indication of ‘mastery’. If a pupil can correctly answer a question with certainty then you have good evidence that they have mastered that concept. There are several benefits over repeated checking as a way of determining:

  • More accuracy – sensitive to sustained uncertainty
  • Less time repeatedly assessing
  • More pupil self-awareness

Growth mindset

So, this is the trickier one. In fact, much of growth mindset rails against testing so is it a contrivance too far to claim that an assessment approach supports it? As far as I understand, growth mindset is impeded by testing that reduces things to right/wrong pass/fail. Assessing certainty  creates a much broader spectrum, meaning you can get everything correct and still see that you can get better. Pupils can also see how achievable ‘better’ is – whether that is unlearning misconceptions or strengthening their understanding of concepts.

Maybe it is too much to claim that this assessment approach supports a growth mindset, but I would definitely argue that it supports it to a much greater extent than regular closed questioning.

Assessment for learning

Ultimately, I think the edu-notion that certainty-assessment most closely aligns to is AfL. It contains all of the most important ingredients of good AfL with technology adding additional value through speed and data collection. I very excited to see what impact it will have in the classroom.

N.B. The tool is almost ready for piloting. Keep watching this space!

Minding my language

Possibly one of the trickiest design decisions (and one for which I don’t think I have yet found the best option)  is about what wording to use around some of the unfamiliar features of this assessment type.

The certainty scale

One of the key features is the certainty scale. At the moment I’m using guess-think-know which I like because I think there is a good semantic distinction between them and they hopefully correspond to people’s natural ways of describing levels of certainty. The drawbacks are, potential differences in interpretations and clumsy phraseology needed to explain ‘think’ to users.

This is not the approach that previous implementations have used. I can see good arguments for simply calling the buttons 1,2,3. This is a much more objective description of what they deliver and might avoid unfair penalisation of those who perceive their knowledge using more self-deprecating language. However, I fear that it is a bit abstract for younger pupils and might be off-putting simply because of the technical appearance.

The categories

I think the titles for the categories are due for a change before I start the pilot. I wrote them in pretty quickly and a don’t think the language is really clear or motivational enough (given that the message is quite tough).

I happy with ‘Secure knowledge‘ for my top scores (3,2).

I think ‘Unsecured knowledge‘ might work better than ‘Guessing‘ as it implies that this knowledge will be secured in the future. Although scored differently, I don’t believe it is that useful to differentiate between correct and incorrect at this level of certainty. Effectively, I think we should communicate, ‘If you don’t know it with any certainty, you have work to do’.

Finally, I think ‘Misunderstanding‘ seems gentler than ‘Misconception‘. It allows ‘misunderstanding the question’ as an explanation so avoids criticising in a way that may seem unjust. I’m going to try sub-categories of ‘Misunderstanding‘ and ‘Serious misunderstanding‘ in the next iteration.

Adapting to audience

Ultimately, there will be no perfect solution that I can author. It seems fairly obvious to me that people will want to adapt the language to fit the context in which they are using these assessments. So a task, definitely on my to-do list, is to provide the feedback panel elements as an editable form so that users can adapt the language for their audience.

The need for scoring

One interesting piece of feedback I received on my prototype was a question about the need to share a single score. Does this not undermine all the rich data and reduce pupil performance to a number – just like any other assessment?


Another suggestion was to share a more traditional correct/total score as well as the one based on the certainty-based mark-scheme.

My feeling is that the former is essential and the latter undermines everything, here’s why…

Scoring leads to honesty

The purpose of the mark-scheme is not so much to generate the final number as to influence the certainty selection behaviour during the assessment. As it stands, the mark-scheme is ungameable – the best approach is honesty. The only thing you can to improve scores (other than learning things better) is heightened self-awareness of your certainty-level (I’ll be writing about this on another day).

If we take away the scoring, we take away this incentive for honest reporting of certainty and, as a result, lose one of the most insightful aspects of the assessment.

Scoring makes improvement clear

Say you do a test at the beginning of a unit – many get a high proportion of questions correct but certainty is low. Certainty scores provide a big margin for improvement that simply wouldn’t exist with a simple mark-scheme.

Dual scoring will undermine the result

If students are aware that simple scores are also considered, this lowers the stakes and, probably, lowers the honesty.

Dual scoring could have bad psychological impacts


Simple scoring ignores misconceptions and reporting this alternative score allows pupils to deceive themselves about their level of ability. Those with high self-esteem will choose the mark-scheme that portrays them best whilst those with low self-esteem may allow this to reinforce negative self-image. We can avoid this by having one rulebook and sticking to it.

Breadth and depth

Yesterday I had a very thought provoking interaction with @powley_r who kindly took the time to have a look at the prototype and provided some interesting feedback.

I mulled this for a while Christodoulou’s post on MCQs does feature some great examples (as does Joe Kirby‘s) but I think questions like this are serving a slightly different purpose to the certainty-based approach I’m advocating.

Domain sampling = Breadth

A traditional MCQ (or in fact any closed question) is generally used as a sample of a much larger domain of knowledge. By asking a series of closed questions, sampling knowledge from across a domain, you can make a fairly accurate estimate of the proportion of the domain that the pupil knows. This is a useful exercise, but there are some limitations.

  • The marking is binary (right/wrong) and does not differentiate between mastery and guessing/insecure knowledge. Mastery is defined in terms of breadth of knowledge and ability to comprehend questions.
  • Probability undermines the assessment of easier and harder concepts:
    • Direct questions about easy concepts are often avoided because they are guessable – and this often distorts the sampling exercise.
    • Even with 4 options (which the pupil might easily narrow down to 2) there is a chance of successful guessing of 25%.
  • Devising complex MCQs that are both fair and testing is very difficult (I should know – it’s a large part of my day-job!)

Certainty = Depth

The certainty-based scoring approach controls against guessing. It can be applied to any closed question and works for a much broader difficulty spectrum than a simple right/wrong score. However, its real strength is the insight is can provide on core concepts – the ones you need every pupil to know in order to progress.

Scoring for certainty gives much more insight into the ‘depth’ of pupil knowledge. This is the type of thing we are often trying to test via open questions, i.e. do they understand these concepts well enough to apply it to this problem/creative task?

Simple closed questions treat knowledge as a binary (know/don’t know) and this lead to inaccuracy. By measuring certainty, we get a much more accurate and reliable result (as consistently demonstrated in controlled studies).

When to use

I think there is probably a place for both approaches, not least because assessing and feeding back about certainty can be emotionally intense. Binary-marked closed questioning is a quick and easy way to measure domain coverage and produces easy to read data, assessing certainty gives more insight into levels of mastery but on larger assessments some of the insight may be too much. Both are underused (in English schools) to the detriment of teacher workload.

The benefits: Proposition #1

I’m having an evening off from coding so, instead, have decided to do a quick blog…

One of the challenges of introducing something completely new to schools (at least, as far as I know) is helping people to understand the value that it brings. I previously listed 10 reasons for using certainty-based assessment but it’s rather academic. Now I’m going to start working this into a pitch (a ‘value proposition’ as sales-types would put it) focusing on three core needs. Here is my initial draft for proposition 1.

1. Assessment that genuinely informs teaching and learning.
You would have to had been living on another planet for decades to have missed the promotion of Assessment for Learning. The principle is not one that anyone argues against – assessment with meaningful feedback is undoubtedly a powerful driver of learning. In practice, the implementations have been extremely variable with a general trend towards less rigour and formality than used for ‘summative’ assessments. During my teacher training, for example, it was implied that ‘Assessment for Learning’ could be used to describe pretty much any activity that had some element of feedback.

What do you really know? offers an assessment method that is great for integrating into learning sequences and delivers the type of feedback that can drive learning in a specific and useful way.

Pupils receive instant feedback on their answers where it is clear what the next steps are – whether it means rehearsal and practice to embed a concept or unlearning/relearning something they have misunderstood. There is always more to strive for – a higher score to achieve – until the pupil is confident in their knowledge.

Individual test results give teachers a previously unavailable insight into what their pupils have grasped and what they have not. This allows the teacher to pitch lessons much more accurately to build on the level of understanding in the room. It also enables targeted interventions to deal with those misconceptions that undermine progress.

Finally, rigorous and granular assessment helps those trying to understand trends at cohort level enabling them to improve learning in the long term. The detail from these assessments provides new evidence to inform the design of curricula and learning sequences.


[originally published on 28/5/15]

So, here it is. The first prototype for the tool that i think could revolutionise assessment in schools.

If this is entirely new to you, please review this brief background:

  • I’m proposing that we use technology to deploy automated assessment that is more than basic quizzes. It uses a technique to differentiate between secure knowledge, uncertainty and misconception
  • This gives much more insight for both teachers and pupils
  • The method is sound but only valid for assessing recall of objective fact.
  • Ultimately, the aim is that teachers can create and deploy their own assessments

Here’s some notes specific to this prototype

  • This prototype is a working demo of the assessment format cobbled together from easily available parts
  • Most of the core features are in place but there’s lots of refinement still to come
  • Reporting functions will come in the future
  • I’d really appreciate feedback – a link to a feedback form is provided at the end
  • If your are interested in taking part in the pilot phase do let me know on the form
  • It would also be really useful if you could share this with more teachers. Please share the link to this page so they can see the context.

Go to the prototype


10 reasons

10 reasons that I’m convinced Certainty-Based Assessment (CBA) should be a widely used method in learning and teaching…

  1. Closed-answer questions are just too guessable. 
    All teachers recognise the problem with closed questions. This is why we use open questions: we have to ask pupils to ‘explain’ so we can gauge how securely they understand concepts. However, evaluating the answers to open questions is effortful and subjective.
  2. We move on from foundation knowledge too quickly.
    As soon as learners have some grasp of a concept, they are likely to be able to guess the answers to simple questions. As a result, we move on from foundation knowledge prematurely and start to ask questions about more obscure knowledge.
  3. CBA is proven to be more accurate and reliable.
    Ultimately, assessment is an attempt to measure what is known. There is a large body of research showing how CBA has a much stronger predictive power than regular (right/wrong) assessment. This is true whether the subsequent tests use certainty-based marking or conventional mark schemes.
  4. It is fairer.
    Conventional closed-question assessment only differentiates between right and wrong answers, CBA differentiates between secure knowledge, guessing and misconceptions. In doing so, CBA gives credit to those with secure knowledge and does not group those who recognise they are unsure with those holding misconceptions.
  5. CBA provides useful feedback to learners.
    Conventional assessment can leave learners unclear about what they need to do to progress. CBA will reward additional effort that goes into mastery (and provides a wake-up call to those harbouring misconceptions!). 
  6. CBA provides useful insight for teachers.
    It tells you how securely (or not!) your learners know things. This helps you identify which areas need most attention and also helps you to accurately predict how pupils will perform in summative assessments.
  7. Online CBA is automated so that no time is wasted on marking.
    Workload is a very real issue for teachers. New technologies and methods we adopt need to, at the very least, not add to workload. CBA may actually reduce workload by providing a level of insight that was previously only achievable through detailed analysis of responses to open questions.
  8. CBA is efficient, quick and objective.
    CBA is only appropriate to measuring recall of objective fact but it does so incredibly efficiently.
  9. CBA is engaging.
    Learners are most likely to be engaged when tasks are both accessible and challenging. The mark scheme of CBA gives learners control of the stakes so that assessments are challenging to learners at all stages.
  10. CBA is easier to create.
    The raised stakes in CBA mean that even basic questions (e.g. True/False) will effectively differentiate between those who know and those who guess. This significantly reduces the complexity of writing quizzes.