(This blog is about the 2013 OSDD. The 2014 OSDD was unveiled on the 6th March 2014 and uses slightly different methodology to the 2013 version. I stand by the criticisms below which were valid when they were written in February 2014)
I’ve been teaching in England for ten years. In that time my opinions and understanding about education have changed enormously. Whilst my views on education have changed in many ways, my despair at the misuse of statistics in education has not. It’s grown. Boy, has it grown. It’s massive; a huge glowing throb in my brain when I think about it. I can’t state this clearly enough: It’s almost unbelievable how misguided the use of data in education in England is.
That’s one of the reasons I started this blog. It has been hugely gratifying to see the influence of bloggers on Ofsted of late, particularly the Famous Five who visited Mike Cladingbowl recently and directly lead to this advice about not grading lesson observations. Other bloggers such as OldAndrew have inspired me to put my ideas here for you to read; you may not agree with me – many argued with OldAndrew and others when they started to question the orthodoxy of ‘progressive’ teaching methods, and the role of OFSTED in cementing ideas in schools – but I hope that I can influence you and others to begin to question some (often unquestioned) ideas about education. One of which is the ridiculousness of much of the ‘data’ used in school.
I could pick any number of examples. There are so, so many. But I’ll start with one particularly simple exemplar of the sheer stupidity in what passes for statistical analysis in education: the Ofsted Schools Data Dashboard (OSDD). It tells you almost nothing but pretends to tell you a great deal indeed. If you haven’t had a look at this monstrosity, you should. You really should. Pick a school you know. Any school. Click on it. And despair.
I’m going to look at Primary Schools, because they are my particular area of interest. I expect that the analysis of Secondary Schools within OSDD is equally flawed, and I look forward to reading comments from those who are in a position to comment. If you teach or have children in Secondary School, please, please contribute your thoughts in the comments below.
In education, things are never as simple as they appear
As a rule of thumb, if something looks simple in education, it’s probably complex. Schools look like simple things: children, teachers, learning – easy, surely? But no. What’s the best way to teach? To learn? What makes a good teacher? What helps children to learn? Trying to distil all this down to a number on a page is a fool’s errand, but plenty of fools have clearly decided that it’s simple.
If school makes all the difference, why does context matter?
One of the first things you notice about the OSDD – other than that the acronym would have been soooooo much better had it been the Schools Ofsted Data Dashboard – is that the school you are looking at is compared against ‘Similar schools’ as well as ‘All schools’ for English, Reading, Writing and Mathematics.
Well, if context makes no difference and education outcomes are purely a function of schools, surely we should just look at how a school compares to all other schools? To do otherwise is to have low expectations of children, isn't it? Well, no. Life is somewhat more complex, after all.
Having two measures is a small concession to the blindingly obvious observation – pretty much entirely denied or ignored by the OSDD, politicians and OFSTED itself - that children are not all the same empty vessels waiting to be filled by school. Teachers and those who work with children make this point all the time: Parents, family, community – context, in the jargon - all make a huge difference to a child’s ability to progress academically. It’s my belief that context is the cake, on top of which school is simply the icing, and that ignoring this fact is making the life of many children in school a data-driven misery. But let’s get back to the data, such as it is.
Houston, we have a problem
When presenting comparisons between ‘Similar schools’ and ‘All schools’, the OSDD shows data graphically in quintiles, a nice simple ‘one of five categories’ display. Okay, seems simplistic, because it is. Top quintile - yay!; bottom quintile, boo!
But the 'Level 4 and above' measure is really blunt for some schools, where all their children get assessed at 'Level 4 or above.' The people responsible for the OSDD website had to produce an Update to explain their flawed methodology because 'large number of schools had the same results, (and) these were placed in more than one adjoining quintile,' i.e. schools with identical results were shown to be somehow 'better' or 'worse' than each other.
The update then clarifies that 'schools in the bottom quintile are at least four percentage points different from those in the top quintile.' Remember that figure. Schools in the bottom 20% have results which are as little as 4% different to those in the top 20%. 96% - bottom 20%, 100% - top 20%. Less than 5% difference between 'yay' and 'boo.'
In data analysis, interpretation counts
In addition to the graphs showing quintiles, there is a written explanation to say whether a school’s results are in the top, middle or bottom 20%, or top or bottom 40%, just in case the simplistic graph was too difficult for you to grasp. Here’s where the interpretation begins to spin the bare ‘facts’, such as they are, to make a vastly complex world appear simple and easy to understand. If a school’s results for ‘Writing’ are, for example, in the fourth quintile against all schools, this is presented as being in “the bottom 40% of all schools”. This could, of course, also be presented as “the ‘top 80%’ of all schools’ but, of course, it isn’t. It’s a small point, but it shows how interpretation makes a big difference when it comes to number crunching. It would be more objective to simply say that the school in in the fourth quintile rather than to add the subjective spin of ‘bottom 40%’, and it is clear what the data crunchers want you to think.
Come a bit closer and be prepared for wonders
When I began to look at these data sets, two things really stood out. Firstly, the data in the graphs is for one school year, 2012. Secondly, there is a link to a ‘List of Similar Schools’ for each set of school data.
So, the first point. This is data for a single year. One year. Which, in a single form entry Primary School with one class per school year, is for around 30 children. Think about that. It’s not a lot of children, is it? Just 30. 60 in a two form entry Primary School, fewer in a rural, mixed-age class. But not many. Not many at all.
You know when you see an advert for a beauty product which says, ‘based on a survey of 124 adults,’ and you shake your head and say, “Yeah, right, that’s going to give you an unbiased result?” Imagine if you decided to buy a toothpaste, or a shampoo, or judge a school on the results of less than 60 people. Or less than 30. Just imagine how much of a fool you’d have to be to do that.
Let’s make the numbers even smaller
Choosing a school at random I looked at the national levels in English, Reading, Writing and Mathematics which are given as 85%, 86%, 81% and 84% achieving ‘level 4 or above’. Now, whilst ‘level 4 or above’ will mean something to you if you teach in Primary education, I’m willing to bet that it doesn’t mean much to many parents reading this, much less to any politician. But there you go, somewhere close to 80% of 11 year olds were given this as a national ‘level’ in 2012 for this school.
So, in a one-form entry Primary school, the national data-crunching says that 25.5, 25.8, 24.3 and 25.2 children ‘achieved level 4 or above’ in English, Reading, Writing and Mathematics respectively. Or in other words, 4.5, 4.2 5.7 and 4.8 children didn’t, in English, Reading, Writing and Mathematics respectively. So between 4 and 6 children didn’t make the grade, between 8 and 10 in a bigger school.
It’s not many is it? A handful, if that. It gets worse. The results are only for children who were assessed at Key Stage 1 at the same school. Any child who joined the school between the ages of seven and eleven isn’t, as far as I am aware, included in the data-crunching. (In the UK, many children are born in cities and grow up outside of them; this is especially true of London, but all the big conurbations see a movement of families as their eldest child approaches secondary school age. The process of counter-urbanisation means that people tend to move away from cities). All this means that the number of children being crunched is less than 30 per class; in some schools it may be as low as 20 children. That’s one child contributing up to 5% of the results.
And remember, the difference between a top quintile 'yay!' school and a bottom quintile 'boo!' school could be less than 5%...
One is not equivalent to five
One result out of such a small total, especially when reduced to a percentage, is a problem, from a statistical point of view. One child seriously skews the data. What if that result is from a child who is on the borderline of a level 4, is or has been ill, has been away from school for an extended period, or has had one of the many temporary setbacks school children are prone to? What if a class has more summer-born children, more boys, fewer middle class education positive children, children with undiagnosed myopia or dyslexia, children in homes with no books or space to think, more children who were born premature, or have suffered from a debilitating childhood illness, or family trauma, or some other factor which has affected their ability to learn? Naturally, all of these factors have significant effects on children’s achievement in school. Children don’t exactly lend themselves to statistical analysis, especially when looked at in small numbers.
A primary school class taken on its own is simply too small for any statistical analysis to mean very much of anything. Occasionally attempts are made to get around this by using data from three years, or for five years. Even this isn’t particularly useful, as the data is too easily affected by outliers. But in the world of OSDD, even this is too much to ask, and schools are held up to the light based on a small number of highly individual children. This is ridiculous, and this should be repeated every time anyone discusses this data. It is not reliable in any statistical meaning of the word; there are too few, highly individual, children in any given cohort for any comparisons to be made with any other cohort.
Let’s compare apples and then rank the oranges
The second thing which took my attention was the ‘List of Similar Schools’. This has clearly piqued other people’s interest, as the OSDD now has an explanation document about this. This really is quite something and clearly shows that the data in the dashboard is pretty much worthless.
As noted, I’m interested in Primary schools, so the OSDD data at Key Stage 2 (eleven years old) is what I have looked at. So, the similar schools are “defined as those with similar prior attainment. For Key Stage 2 dashboards, prior attainment is the Key Stage 1 average point score of the cohort who sat assessments/examinations in the most recent year of the dashboard.”
Okay, so the results of assessments made when children are seven years old (Key Stage 1) are used to crunch data into lists of 110 similar schools. There is no explanation as to why 110 has been chosen (Key Stage 4 is compared with just 60 schools). I assume this is because there are more primaries to work with, but, then again, who knows?
Then it gets even more fun. Schools have to submit data for every child to the Department for Education every January in a January Schools Census. The data is crude, with little more than whether a child receives free school meals and their assessed levels in reading, writing and numeracy. And then the real magic happens: the data is crunched to find ‘Similar schools’ based on the submitted Key Stage 1 assessments of children in Year 2 in 2012, not the results of the Year 6 in 2012. The assessments of the 2012 Year 6s are, however, compared to that of ‘similar schools’, as well as being compared to ‘all schools’.
Take a moment to think about this. The current government model, which seems to believe that schools are solely accountable for a child’s progress in school, groups schools according to data for one group of children and then compares the data for a completely different group of children.
So if you have a group of academically able Year 6s and a group of struggling Year 2s, good for you. The other way around? Tough. Either way, to compare one group of people with another in this way is patently ridiculous and any analysis based on this data is of no worth whatsoever.
An equivalent would be to measure, say, the height of all the 7 year olds and 11 year olds at a doctor’s surgery, then to group the results of 110 ‘similar’ surgeries based on the 7 year olds and compare the heights of the 11 year olds. The results would be equally meaningless.
My statistics professors best cover their ears
The explanation document provides an acknowledgement that crunching small data sets is unreliable (or ‘unstable’ to use the not-at-all statistical terminology used in the document).
Okay, so how small does a cohort have to be in order to be deemed unstable? Bear in mind that the smallest school I have taught in had an intake of just 12 children per year. “Prior attainment data for those cohorts with fewer than five pupils is removed prior to grouping as it could be unstable.” Hello? You think that five children (a number so small my ‘rules of written English usage’ means I have written the word rather than numerals, that’s how small it is), yes, five children is a reasonable data set? Kaboom! There goes my not-statistically-robust Bat Sign… and no statistician in the world would accept any analysis based on this data, and neither should you. It is meaningless, and anyone at OFSTED should hang their head in shame for using it in any way whatsoever.
So how much does this ridiculous data set matter?
By my analysis the data in Ofsted’s Data Dashboard is not statistically robust, and it should not be used to make any kind of judgements on a school. So that isn’t happening, is it? It appears that the data is indeed used to inform judgements. And what’s worse, it is being used, as far as I can tell, by Ofsted inspectors to make decisions about schools before they even enter them.
Clearly, OFSTED don't use the dashboard itself, as they have access to the RAISEonline database (another data monstrosity which I’ll come to in later posts), but this is based on the same data used for the SODD; it has further analysis but the headline statistics say the same as the Data Dashboard. And they are being used, it appears, to pre-judge schools.
Here are five random schools which were inspected in December 2013, with their Ofsted judgement and their Dashboard ‘English quintile’.
John Baskeyfield VC CofE Primary School, Inadequate, 5th quintile
St John's Church of England Voluntary Aided Junior and Infant School, Good, 1st quintile
Silverdale St John's Church of England Voluntary Aided Primary School, Good, 2nd quintile
Our Lady Star of the Sea Roman Catholic Voluntary Aided Primary, Requires Improvement, 5th quintile
Cumberworth Church of England Voluntary Aided First School, Outstanding, 1st quintile
Now, this may be a co-incidence, but given that there have been many tales of schools being re-graded since the Dashboard appeared, and since many schools have hopped two grades from being Good or Outstanding to Requires Improvement or Inadequate, it gives cause for concern.
Statistics in education are always to be viewed with scepticism bordering on hostility, but the School Ofsted Data Dashboard is a SODD and is yet another example of the misuse of data in education in England and Wales.
I’d really appreciate any comments, especially from those with experience of the Data Dashboard as it applies to Secondary Schools. And remember, lots of people believing something is true doesn't make it so.