A Level Results 2020

A blog post written through 13Aug2020, the release day for A Level results in a very odd year.

This blog is intended as a ‘hot take’ – an immediate reflection on the release of A Level results. I write from an HE perspective, and so might not emphasise other perspectives as strongly as I might. I am going to open with some context (the first half of the blog), which will hopefully allow international readers to follow the specific discussion of the awards process (second half).

The core issue is that A Level students (typically aged 18, the end of secondary school) could not sit their national exams because of the virus. As coursework has recently been stripped out of most A Levels, this means that grades this year have been calculated by an algorithm.

The Devolved UK Education System and Alternative Qualifications

The four nations of the United Kingdom (England, Scotland, Wales, and Northern Ireland) operate on the basis of ‘devolution’. Certain central governmental powers have been franchised to national governments.

One of these powers is education, so different nations are entitled to take different approaches to Education policy. Scotland operates a system of ‘Highers’ instead of A Levels, for example, and has a much more flexible understanding of the school-HE transition. A Levels are therefore not the universal qualification of UK students, but are the dominant qualification taken by students who go on to enter HE (i.e. they are seen as the main traditionally-‘academic’ qualification).

Even students in England might not sit A Levels. While the school leaving age of 16 funnels students through the General Certificate of Secondary Education (GCSE), there is no obligation to be in school beyond that. People aged 16-18 are required to be in some sort of training, but this might be technical training such as an apprenticeship. Loosely aligned with vocational careers is a parallel set of qualifications endorsed by the Business and Technology Education Council (BTEC). These are specialist awards related explicitly to employment. For example, there are BTEC qualifications in blacksmithing, digital games development, and agriculture. It is worth noting that BTECs are primarily assessed through coursework, and so have been less affected by the virus.

School Agendas in English Secondary Education

School policy is heavily fragmented because it is primarily the responsibility of local councils, and contains a mixture of public and private provision, together with some hard-to-categorise alternatives. I will attempt a potted historical overview, emphasising the core developments.

The New Labour government invested heavily from 1997 in school education as a way of allowing working class children to access middle class lives. This investment accelerated a focus on evaluating schools by metrics. Statistics such as the grades attained by pupils become important indicators of school performance. In a phenomenon often called ‘Goodhart’s Law’, the descriptive indicators became targets and schools began to compete with each other. In Miseducation , Reay describes the important role of parents in this competition: extraordinary actions like moving house allowed families to win the ‘postcode lottery’ of school catchment areas, for example. More subtly, she argues that the experience of being in a class full of pupils whose families have ambition and professional networks is of huge benefit to the less privileged pupils.

The effect of metrification has been to widen the quality gap between the average education experienced by a poor child as compared to a rich one. ‘Good’ schools got better, ‘bad’ schools got worse. This is an extremely important point because any algorithm using school performance as a variable to predict grades therefore contains explicit artefacts of social class.

The 2008 financial crash provided post-Labour governments with a mandate for ‘austerity politics’: aggressive reductions in public spending to ‘balance the books’. Particularly hard-hit were local councils. Though the Dedicated Schools Grant is ring-fenced money for education, school funding has been subject to inflationary erosion. At the same time, a ‘demographic dip’ in birth rates has meant that the number of pupils going through school has been somewhat smaller for the last ten years or so.

The A Level

Typically, pupils take three A Levels. This makes the A Level system extraordinarily specialised by international standards: picking three subjects requires a high level of specialisation at the age of 16. It is still relatively common to hear of people ‘frozen out’ of degrees such as medicine by the choices they made at 16.

A Level grades are typically awarded by relation to a normalised curve: the best students-performing get the best grades. Until 1987, this was a hard mapping: the top 10% of students got an A grade. More recently, the approach has softened somewhat, and in 2012 about 25% of A Levels were graded as an A; the recently-introduced A* grade has been awarded to around 10% of A Levels.

Note that this ‘curve’ approach contrasts with criterion-referenced exams such as the driving test: any driver good enough to pass the test gets a pass grade.

My own (2007) A Levels involved two stages: an Advanced Subsidiary (AS) stage in Lower Sixth/Y12 and the ‘full’ A2 in Upper Sixth/Y13. Each stage had a mixture of exams and coursework. I found A Levels really hard.

While he was education secretary, Michael Gove reformed A Levels in ways which are very relevant today. While there were exceptions, broadly the new qualifications were assessed by a single session of exams at the end of two years of learning. This is important today because it means that there are few pieces of data which can be used as the basis of grade predictions.

While it would be unreasonable to design an education system around a pandemic, it is hard not to notice that the AS-A2 system would have had a much better chance of responding fairly to this particular event. Indeed, this has been the case in Wales, where the devolved administration has kept the AS-A2 model and AS grades have been used to award minimum A2 grades.

The HE Agenda: Money

The 1997 Labour government saw HE as a part of their agenda to improve social mobility through education. Higher Education is expensive to provide, though, and this required a huge change in the way degrees were funded.

The prohibitive government costs were defrayed in England by shifting some of the price onto students, based on the idea that those benefitting directly from HE should bear some (£1000/year) of its cost. Loans for these fees, as well as living costs, were underwritten by the government on generous terms to students. Annual tuition fees of £1000 grew to £3000 in 2004, and in time tripled again to £9000. The £9000 fee has been slightly increased to account somewhat for inflation. In the devolved education policy, Scottish students who study in Scotland do not have to pay fees (which are covered by the government).

The shift in funding has had lots of consequences, but for this discussion the core point is that English Universities require undergraduate fee income to survive. This has led to fierce competition for students. Much of this competition is on the basis of ‘league table’ rankings produced by the press. Importantly, an important league table metric is some measure of the average entry grades of a student to that University. Entry grades are therefore an important part of the survival strategy of many Universities.

In the early days of the COVID lockdown, Universities UK (the ‘voice’ of UK Universities) presented a package of measures to government which it suggested would help support the sector. The fears at the time were mostly centred on many students – and most particularly international students (who pay higher fees) – deferring places so that they could study once the pandemic was over.

In the event, most of the proposals were flatly rejected. One idea which was adopted, though, was a cap in student numbers. Universities – whose autonomy over admissions is protected by statute – volunteered to limit their home+EU student numbers to 105% of the numbers they had predicted earlier in the cycle. This is important because any mechanism for circumventing this cap becomes an enormous strategic opportunity for Universities. It is worth noting that this cap was immediately regarded as poor policy by respected actors (e.g. HEPI), in particular because (i) a 5% margin is huge; and (ii) Russell Group Universities seemed likely to lose international students, so might increase their domestic recruitment by more than 5%. It is currently unclear whether any Universities are going to break the 5% barrier, or whether students will be turned away from full Universities which could accept them in the absence of the cap.

A related concession was the moratorium on ‘unconditional’ offers: Universities offering applicants a place on the basis of their existing attainment.

A Levels during COVID – The Key Moments

A Levels didn’t happen. The UK-wide ‘lockdown’ in March of 2020 precluded any safe way for students to sit summer exams. The deliberate back-loading of assessments meant that this was a big problem, because there was no obvious basis to grade the performance of the cohort.

The government’s response to the pandemic overall has been criticised widely, but in the case of A Levels it is probably fair to say that grading students was not a national priority straight away. The Education secretary, Gavin Williamson, was first occupied with the practicalities of issues like school closures and the mechanisms through which the children of Key Workers could be safely taught. These were not small issues.

Though the minister committed to awarding A Level grades to this cohort, the practicalities were left to Ofqual – the body which regulates qualifications. Ofqual sought additional data, and then quietly developed a way to award every student a fair grade in each of their subjects.

Scottish Highers grades were released around a week before A Level grades, and there was an astonishing public backlash to the algorithms used to award grades. Broadly, the criticism was that rich kids got better grades than they should have done, while poor kids got worse grades than they should have done.

The political consequences were pronounced. The Scottish First Minister defended her Education secretary one day, and then revised her view the next. There was a complete U-turn in the grading policy.

Political pressure was therefore placed on Williamson, even before the release of results. Today’s release of data was preceeded by panicked midnight policy announcements about a ‘triple lock’ of predicted, awarded, and mock-exam grades; and Ofqual withdrawing from the 0815 press briefing on results (before recommitting to the briefing).

Broadly, the main criticisms of the A Level results follow a similar form to the criticisms of Scottish Highers results: they reward the rich at the expense of the poor.

Human Stories

Twitter today has been full of stories about times when results have been grossly different to the ones expected. It is hard to judge these on the evidence provided in 280 characters, but I feel it is important to give some space to the way that the algorithm has very specific consequences upon real people.

Public Defences

Some commentary has also come out in defence of the approach. I have seen less of this on Twitter, though perhaps this is because of the people I follow.

The Ofqual Approach

Ofqual have published a 319-page report on the process they used to award grades. It is clear that an astonishing level of work went into this project, and the sheer number of appendices for special cases demonstrates a clear concern for pupils in ways which have not gained much airtime.

It is only fair to flag up that I have not read the whole report. Instead, I have focused on the executive summary and the outcomes data.

Ofqual were concerned to produce results with a ‘similar value to grades issued in any other year, so that those using them to select students (sixth forms, universities, employers, etc.) could have confidence that their worth was in line with previous years.’ They note transparently that ‘a critical factor in achieving that was maintaining overall national standards relative to previous years.‘

To this end, they requested a ‘centre-assessed grade’ (CAG) from schools for each pupil in each subject. I would characterise this as a ‘predicted’ grade. They also requested a rank order of pupils: Ayeesha is better than Jimmy who is better than Alex who is better than….

This ranking becomes the key object of contention in many criticisms, so I present Ofqual’s rationale in full:

There were several reasons for this. First, we know from research evidence that people are better at making relative judgements than absolute judgements and that teachers’ judgements tend to be more accurate when they are ranking students rather than estimating their future attainment. The research literature suggests that, in estimating the grades students are likely to achieve, teachers tend to be optimistic (although not in all cases).

They state that analysis of the CAG supported their analysis of teacher over-estimation, but took care to place no blame on teachers for doing so: it is a human thing to hope that your pupils will perform at their best. Specifically, they state that awarding the CAG grade would have likely ‘led to overall national results which were implausibly high’:

At A level, we would have seen the percentage of A* grades go up by 6 percentage points from 7.7% of grades in 2019 to 13.9% of grades this year, and the percentage of grades that were B and above increase by over 13 percentage points from 51.1% in 2019 to 65% this year.

Ofqual had no standardised national data to make defensible comparisons between the Biology CAGs of one student doing OCR in Tower Hamlets with another in Plymouth on AQA. Committed to a standardisation protocol, their only option was to standardise at the level of the school.

Their final model – the Direct Centre Performance (DCP) model – started from a prediction of that school’s performance for 2020. The main determinant for this prediction was historical data, though there are important discussions of special cases (e.g. small schools, new schools). It seems like the CAG grades were often used in situations where it was statistically inappropriate to predict a school’s outcomes. For most settings, though, my reading of the report is that the ranking of students was used to populate the predicted grade distributions. If a school was predicted to have three A*s in History, the top three ranked students got those grades.

In fulfilling its aims – issuing grades which were similar to those awarded in other years – Ofqual was slightly too generous (e.g. the A-or-above grades increased by 2.4%), but broadly successful when the aggregated data is considered. They note that 96.4% of awarded grades were within one grade of the CAG.

For what it’s worth, I believe that Ofqual have made a solid attempt to do the best for students within the terms laid out by the Education Secretary. There were millions of CAGs, and the scale of the data set means that accomplishing even the big-picture standardisation was an enormous technical accomplishment. They were asked to protect the value of an A grade, and that’s what they did.

Description of the Ofqual Outcomes

The headline number is that about 40% of awarded grades were lower than the CAG. Ultimately, this is a hit which Ofqual is resigned to taking, as to award the CAG would have inflated grades in a way which the education secretary directed them not to do.

Some of the uncontentious historical data really surprised me. The female:male split in A Level entries is hovering at about 60:40 (page 144). Had I not blogged about this earlier this year, it might have surprised me to know that only something like 65% of A Level entries were from white candidates (page 145), or that less than 75% of entrants declared English as their first language (page 146). It was good to see that entrants with SEN attained grade distributions comparable to those without (page 147).

Entrants flagged as being eligible for free school meals (i.e. from poor families) were a more complicated case. In line with previous years, their attainment of high grades was lower than that of pupils not flagged as FSM-eligible (gaps between groups: -7.1% A or higher, -6.6% C or higher). These gaps were slightly larger than those in previous years.

Attainment by socioeconomic status was comparable to previous years: 28.3% of high SES entrants were awarded A or higher, whilst 21.0% of low SES entrants were.

There is one case where the algorithm has arguably created inequality rather than merely reproduced it. The proportion of A-or-higher grades awarded to independent school pupils has increased by 4.7% this year. This must be some consequence of the decision to standardise by school, though the exact mechanism for this is unclear to me.

Criticisms of the Ofqual Approach

Criticisms so far have come in two main flavours: criticising the aggregated results and criticising the outcomes for individual human beings. Somewhere between these are rarer criticisms of methodology, and a few well-rehearsed general criticisms of A Levels have been wheeled out as well.

I note that many criticisms do not commit themselves to an alternative option, or consider the alternatives sufficiently critically. It seems certain that relying on CAGs would not have eliminated bias but merely introduced new biases, for example.

Criticisms of Aggregated Data

The SES and FSM data demonstrate that algorithm has faithfully reproduced existing inequalities. The creation of a bias in favour of independent schools also seems like a valid criticism to raise on analysing the aggregated data.

This style of argument has also been raised at the level of schools: what happens when a school is really turning things around, and legitimately expects its pupils to get much higher grades?

In reproducing the inequalities, Ofqual have laid bare the profound inequity in the A Level. It is clear that A Levels do not assess all groups fairly, and one might suggest that Ofqual has inadvertently demonstrated that this unfairness operates at the specific level of schools.

On the one hand, this unfairness is part of what A Levels currently are. On the other hand, it isn’t what A Levels should be. It would be unfair to expect Ofqual to fix the system in a plague year, but it would be completely fair to expect Ofqual – the exam regulator – to regulate A Levels so that such inequity is eliminated. Pragmatically, it might be that this scrutiny holds their feet to the fire in the years to come.

Criticisms of Individual Outcomes

I find the most persuasive argument flows from the human stories. It seems that some students should have been awarded much higher grades than they were. It is unclear in any given case exactly why this happened, but that these events have happened evidences serious flaws in the way that recognition of individual merit has been sacrificed to the system.

There are a suite of obvious rebuttals to any focus on individual cases. Perhaps they are very rare but over-reported on social media. Perhaps the facts of this case involve a mis-reported ranking or the details of that case derive from a student on several grade boundaries missing all of them. I remain persuaded that no fair system would permit cases like this.

It is fair to consider whether issues like extra exam time for students with dyslexia are properly addressed by an algorithm which treats students in this aggregated way, too. It is unclear to me whether or not the algorithm is discriminatory.

Criticism of Methodology

The ranking approach has been the object of the most sustained criticism. It is not clear that teachers knew how important the ranking would be, or that they are trained how to rank. There are further methodological problems when converting ordinal data (a ranking) into cardinal data (a grade). It is also unclear how a huge difference in ability is considered relative to a small (and possibly teacher-misjudged) difference, though I understand that the ranking was within groups of students with the same CAG prediction. Some commentators have suggested that schools should have been given the grade distributions and asked to populate them. I feel very uncomfortable with this idea, as it seems to place teachers in a position of judgement which they are not trained for in the current model of national curriculum.

Relatedly, Prof Nicola Ingram notes that:

The pupils who often surprise teachers the most with unexpectedly high exam performance tend to be those from disadvantaged backgrounds. It is difficult to conceive of the new system being able to take account of and attempt to mitigate these inequalities.

If we take this view seriously at scale (and I do), it weakens the Ofqual rationale for using ranking to minimise teacher bias. Just like CAG data, ranking data is open to substantial (if unintentional) bias.

The point about changing school profiles is also worth framing as a methodological point: in reproducing historical distributions, this approach has given improving schools worse grades.

Criticism of Remediation

The broad criticism of the mechanisms for making wrong results right is that the entire set-up is a complete mess. It is unclear what the appeals mechanism is (it will be published in early September), and the scope for Autumn resits does not seem sufficiently detailed right now. The financial cost to pupils of any such procedure is also unclear, which is an important issue because poorer pupils are more likely to get low grades.

The timescales are also a real issue. Students engaging with remediation will miss clearing. The ad hoc fix for this – allowing Universities to break their numbers cap for such students – seems likely to either complicate the HE competition for students or to simply force such students to take a whole year out.

The competition point is not trivial from the student perspective. High-tariff universities may have an incentive to refuse entry to students with good-enough-to-study grades in clearing (when they can recruit students with better grades).

Ofqual also have some astonishing guidance suggesting that challenging the grade of one student might disrupt the grades of others. This seems cut-and-dry unacceptable to many:

Broader Criticism

Well-worn criticisms about A Levels in general also find a way to contribute to this specific event. In particular, the ‘all or nothing’ late-stage assessment has been accused of being regressive.

My own biggest worry is that the pupils leaving education aren’t really being represented in the conversation. It’s fine for the Universities Minister to urge HEIs to be flexible with awarded grades, but this does not fix the core issue of the grades being bad. If the grades themselves aren’t fixed, then people with Bad grades – who aren’t generally going to University – are going to get stuck with them as they apply for jobs in a recession.

Even more philosophically, it is worth considering what the point of these assessments is. The current value of a grade is not to the student, but to employers and Universities when crudely sorting applicants. Aiming at grades costs time and effort which our education system could spend in different ways, if we chose to. Is the cost worth what we could have instead?

Proposing an Alternative Approach

The ‘triple lock’ proposed at midnight by Williamson seems to be more inflationary to me than simply awarding the CAGs to students, because the CAG could (in principle) be trumped by a high mock grade.

Despite this, I believe that Williamson’s triple lock is actually a fair approach (though – unforgivably – it comes much too late for many students to access clearing). It’s what I would have done, though I hope I would have done it earlier.

It is crystal clear that this would lead to inflated grades, and also that some schools have over-estimated pupil grades. It is also clear that there would be teacher biases which advantage some students and disadvantage others if the CAG were essentially driving the awarded grades. These injustices seem tolerable to me, given the awful circumstances.

A thornier issue is that Universities might have been forced to accept more students than anticipated, if they had banked on some students missing their grades. It is plausible that this would have forced some Universities to break the 5% numbers cap.

I note that the incredible rigour of the Ofqual procedure may well have deterred schools from over-estimating grades too wildly. Egregious over-estimation could be identified through the algorithms currently used to award grades. For example, centres with a too-high proportion of grades lying 2 grades above the algorithm might be reviewed by a human.

The way that these A Level results have failed to support young adults is unutterably sad. I hope this motivates people to change things.

Update

I have been curating a thread of some of the interesting views on Twitter here.