“These are the percentages of children achieving level 4 or above in reading, writing and maths:
- 2013: 77%
- 2012: 70%
- 2011: 58%
- 2010: 69%
- 2009: 77%
- 2008: 76%
As David wrote, “The agenda for school improvement has to move away from endlessly pouring over data looking for patterns that don’t exist. We need to find new – better – ways to hold schools to account and come up with new definitions of what school improvement means.”
Steve Adcock, Deputy Director of Academies at United Learning, a 34-strong academy chain has taken issue with what I wrote, saying the following:
“I’m no statistician, but by Marwood’s reasoning I presume we’re unable to criticise the poor records of individual hospitals and clinics, let alone doctors and nurses, since any given hospital “is too small to make meaningful generalisations.” And what about a dangerous stretch of road which causes several accidents – “too small to make meaningful generalisations” – or a restaurant with a nasty hygiene record: “too small to make meaningful generalisations”?”
As Steve says, he's not a statistician, so I'll unpick a few of the points here. The main issues are a) what drives the numbers we collect? and b) how much random variation should we expect? Both are often confused in education and elsewhere. I’ve written about the first issue before, suggesting that the simplistic assumption that pupils outputs are directly related to teacher inputs needs to questioned. And it is surprising how often people simply don’t appreciate what statisticians have come to understand about using data to summarize complex real world situations, specifically how there is random natural variation in many sets of data.
In the case of a restaurant’s hygiene record, for example, the only factors driving the numbers are the actions of the restaurant owners, managers and staff. Nothing the customers do can meaningfully affect the data. Even in the extreme case of sabotage by customers, hygiene is, ultimately, to be the responsibility of the restaurant. Since the hygiene record of the restaurant is completely within the restaurant’s control, we would not expect or accept any drop in expected standards. We would not expect to see random variation in any data as the data is entirely within someone's control. It should all be 100% perfect.
When it comes to dangerous stretches of roads, things become much more complicated. Now we have consider how the road is used, how it is designed, the geography in which it is situated and the actions of those using the road. It’s too simplistic to say a road ‘causes several accidents’, because the factors causing the danger are more complex. This is why dangerous roads see natural variations in the number of accidents which happen on them.
Yes, it would be possible to improve the design of the road to limit accidents. But accidents will happen, to coin a phrase. That’s why you see data which looks like the graph below, which doesn’t have a pattern in the sense I used the word. In coming years could see between 0 and 5 deaths and between 6 and 15 serious injuries without any surprise whatsoever. Shock, disappointment, grief - yes. But there is no pattern. It’s clearly a dangerous road, and that’s not spotting a pattern, it’s an observation about the fact that people die and are injured whilst using it.
You could generalise about the results you have by saying that, in the case of seaside primary, the average percentage of RWM Level 4 is somewhere between 57% and 86%. You could say at each cohort seems to be ranged around an average result of 71%, but to say that something went ‘well’ in 2013 and 2009 and ‘badly’ in 2011 isn’t sensible. But you couldn’t say that results ‘improved’ in 2009, ‘went down’ between 2009 and 2011, before ‘rising’ from 2011 to 2013 without implying that something happened at each ‘turning point’ to cause those changes. That’s the danger, and that’s the point I’m making.
Clearly, in some schools, things occasionally do go drastically wrong and student outcomes are extremely low compared to other schools. But that isn't spotting a pattern either, it's seeing an anomalous result which needs to be explored further. It might just be a highly unusual cohort. It might be a significant change in the school. It might be many things. It would be highly unwise to generalise and to leap to the conclusion that it must be something which the school has done, given that subsequent years are likely to see better outcomes due to simple regression to the mean.
Steve Adcock continues to say that, “There is so much more to a school than its Progress 8 score, but if my kids were at a school where they made half a grade less progress than their peers in other schools, I would want to know about it, especially if this lack of progress occurred year after year. Of course it wouldn’t mean that the school’s leaders and teachers were uncaring or incompetent, but I dare say they could be doing a few things better.”
The problem with this is that in virtually no schools do you see a ‘lack of progress (which occurs) year after year’, as Stephen Gorard has shown in papers such as “How Unstable are ‘School Effects’ Assessed by a Value-added Technique?”. As Gorard says, ‘The study asks how many schools with at least 99% of their pupils included in the VA calculations, and with data for all years, had VA measures that were clearly positive for five years. The answer is - none. Whatever it is that VA is measuring, if it is measuring anything at all, it is not a consistent characteristic of schools.” Gorard looked for schools with five years of negative Value Added and found, “Out of 2,897 schools (), only two had 99% or more pupil coverage for all years, with five successive years of apparently negative CVA. No schools had five years of apparently positive CVA.”
Steve concludes by saying, “Twenty five years since the emergence of league tables I think we’ve finally got a system (thanks to Progress 8 and tough terminal exams) which shows up real school improvement. I hope that Ofsted and Amanda Spielman continue to challenge schools which persistently fail to ensure that students depart with their pockets full of decent grades.”
I couldn’t agree more, but let’s make sure that we use the information we have to ensure that we know that schools are definitely persistently failing, and to look carefully to see if we can establish why that might be the case, rather than simply mistaking the natural variation in student scores - which we know we should expect - as a pattern in the data.