It’s time for Ofsted to inspect without test score data

29/10/2014

Having criticised the way which test score data is used by Ofsted, I was asked to meet Mike Cladingbowl, Sean Harford and various statisticians to discuss Ofsted’s use of test scores. I was asked how I suggest Ofsted should move forward, given the manifest problems with using test score data to hold schools to account. Since I was asked, I've responded.

What is the purpose of the current inspection regime?

When, in 1992, Ofsted began inspecting schools and publishing inspection reports, little was known about schools by those not working within them. That cannot be said of today. Schools publish all manner of information about what goes on within their walls. Government websites publish performance tables, with enormous amounts of data on subjects from recent test scores to 'Teaching staff and Education support staff expenditure’. Government-funded organisations such as the FFT provide governing bodies with mountains of information, and those overseeing schools are asked to be much more accountable for what happens in school. The accountability structures which are currently in place aim to hold schools to account in ways which would have inconceivable in a previous era.

So why send in inspectors to look even more closely at what a school does?

There are a number of clear reasons to have an independent inspection of a school. Governing bodies may have been failing to hold those running their school to account. Head teachers/senior management teams may have made decisions which are bad for their schools and those within them. Teachers may be doing a poor job. Children and parents may not be satisfied with the school. Things may not be what they seem.

So what should inspectors be doing? What checks and balances do we need? What can be reasonably expected of a short inspection period such as we have currently?

I would argue that we need independent inspections to check:

1)      Whether governors are holding head teachers to account.
2)      Whether head teachers/senior management teams are working well.
3)      Whether Children, parents and staff are happy with their school.

Astute observers will notice that this is somewhat different to the current inspection regime which passes judgements on:

Overall Effectiveness
Achievement of Pupils
Quality of Teaching
Behaviour and Safety of Pupils
Leadership and Management

Those who have read my previous work will be unsurprised that I think that Ofsted’s current focus should be altered to reflect the dawning realisation that test score data cannot be used in the way it has been up until now. Whilst the inspection framework relies on assumptions about the reliability and validity of test score data, Inspection cannot be fit for purpose.

Making Inspection fit for purpose

I suggest three different ways which we could go.

1) Carry on regardless

If things continue as they are, 20% of schools will be judged to be providing an education which is less than ‘Good’, as defined by Ofsted. Non-selective schools serving the disadvantaged will be more likely to be in this 20%, as ‘Achievement of Pupils’ in these schools will of necessity be compared unfavourably with schools serving the advantaged.

Regardless whether Inspectors are conscious of their actions or not, judgements of ‘Achievement of Pupils’ are driven by their flawed understanding of test score data. Since this grade is highly correlated with the Quality of Teaching and Overall Effectiveness grades (over 95% in agreement), schools are effectively judged on test score data.

Schools which are badly run and inadequately held to account will not be identified by this system. Teachers are given no encouragement to identify problems in their schools, since a poor grading by Ofsted will directly impact on their working conditions. Inspectors will make erroneous judgements on the ‘Quality of Teaching’ based flawed understanding of data and superficial impressions of a school’s context.

2) Stop grading Achievement of Pupils

Achievement, as defined by Ofsted up to now, is a measure of the amount of progress children make within a given school.

This is hugely problematic, since it seems to be dawning on those in positions of influence within the education establishment that attainment data for a school – the raw test scores pupils are awarded – are a function of the children in a cohort, and they can only be said to represent that cohort and nothing else. There are no ‘trends’ from year to year, as each cohort is unique, and one may as well weigh the children and compare their mean weights.

Furthermore, since the children in a given school can’t be said to be drawn from a wider population, it does not make any sense to compare their test scores to those of the wider population as the school cohort is not an independent sample of those who have taken the same tests.

Some observers have accepted that small cohorts can often have wildly fluctuating mean results due to the variation small sample represent. These observers tend to assume that larger samples do not suffer from similar problems, and we are starting to see ‘three year trends’ being reported as a solution to the small numbers problem. This does not tackle the core issue, however. If the cohort, or school, is representative only of itself and not all those taking a test in a given year, any comparison based on the Central Limit Theorum is meaningless. All that can be said is that a cohort has a mean which is either above or below any national mean. There is no ‘significance’ test which has meaning.

Additionally, there is no way to ascertain what has contributed to any increase in relative test scores. It might be the school, but it might be shadow education, family involvement or lack thereof, or any number of other factors. So Ofsted are in no position to make any objective assessment of ‘achievement’ or ‘progress’ as it is currently defined.

Therefore, ‘Achievement of Pupils’ should be seen as a redundant measure, in same way that graded lesson observations - which Ofsted agreed (with admirable speed, earlier this year) cannot be assessed objectively - are no longer used in inspections.

The government’s Performance Tables provide test score and other data for those who wish to judge schools using numbers. Allowing parents to draw their own conclusion seems preferable to the current situation, in which dubious conclusions are drawn based on numbers which are fuzzy, biased and often Not Even Wrong.

3) Stop grading Quality of Teaching

This judgement has been shown to reflect the ‘Achievement of Pupils’ judgement in the majority of Ofsted reports. If the Achievement of Pupils grade is removed, this judgement should also be discontinued.

It is simply not possible for Inspectors to make an objective assessment of the Quality of Teaching in a two day inspection visit. In 2013, we were told that Inspectors did ‘lesson observations with senior leaders to agree the quality of teaching. And triangulate this with the progress that students are making. We look at books too, and check the quality of feedback.’. Now that lesson observations cannot be used to ‘agree the quality of teaching’, Ofsted is trying to limit the impact of the vacuum this has left.

Michael Tidd has written eloquently on this in his recent post, ‘Teaching today: not enough evidence; too much evidencing’. As Michael says, ‘All the time Ofsted are criticizing schools for failing to evidence things, or praising those schools who excel at producing evidence, other school leaders will feel compelled to continue to demand that work be evidenced.’

However this judgement grade is reworded, it will always be problematic. Ofsted must trust head teachers/senior management teams to run their schools, and governing bodies to hold head teachers/senior management teams to account, rather than try to micromanage what happens in classrooms.

Does test score data have a role in assessing schools?

It is clear that some people high up in government education still believe that test score data has some role in schools. Even within schools, there are those who think that tests scores are useful.

The past twenty years has seen vast improvements in schools test results, which could be seen as a vindication of their use to ‘drive up standards.’ I have some sympathy with this position. I can understand that those who don’t work in schools – and especially those who don’t work in ‘difficult’ schools – think that teachers need some kind of stick to encourage them to have high aspirations and to aim high for the children in their charge. There may indeed be some teachers who have ‘low expectations’. This is not my experience, and it is not what research into teacher aspirations suggests.

It is for governing bodies to put pressure on their senior management teams, and head teachers to work with their teaching teams, all with the goal of aiming as high as possible.

Nationally, a ‘floor target’ makes some kind of sense – as long as the grading structure of terminal exams doesn’t make the task impossible. But at school level, it simply doesn’t make sense to set ‘floor targets’ at, say, 85%, when a single child represents much more than one of those percentage points. 85% of 30 children – a typical Year 6 cohort – is 25 and a half children out of a class of thirty. That means that just five children have to have a bad day, or a difficult life, or any one of hundreds of problems which have nothing to do with their teachers or schools, for the school to be said to be failing. That’s wrong and makes no sense.

Finally

My suggestions are therefore:

1)      Carry on regardless
2)      Stop grading Achievement of Pupils
3)      Stop grading Quality of Teaching

As ever, comments are more than welcome.

7 Comments

Michael Tidd link

29/10/2014 10:39:21 am

I think you're absolutely right that the real focus of Ofsted needs to change completely. I would much rather see an inspectorate focussed on evaluating the success (or otherwise) of local accountability and some form of inter-school accountability and support as suggested by the NAHT's 'Instead' model.
Performance tables already serve the function of highlighting attainment, etc.

Rebecca Stacey link

29/10/2014 02:31:11 pm

I think you're spot on - this focus on cohorts' progress and attainment is harming our schools. The pressure on teachers and school leaders is only going to get worse as the new set of assessment 'descriptions' takes holds.
However the most interesting point is you're first one - when Ofsted was set up schools were so much more 'closed' - data came from Ofsted and pretty much ofsted only. A new focus is needed - this is a great discussion to have.

Jack Marwood

30/10/2014 10:55:07 am

Thanks for this, Rebecca, and for the comments you have made. I agree with you that the increased availability of information about schools has changed the game considerably - and we do need to discuss where we should go next!

@chemistrypoet

29/10/2014 04:31:46 pm

Another excellent blog post.

There is some meta-confusion with respect to progress across the education system. On the one hand, Ofsted and Government claim that standards have risen, but on the other they say that they haven't risen enough. Levels and GCSEs have been discredited, and we have another new curriculum. Ofsted inspections have waxed and waned, and continually transformed and changed. Likewise, expectations of schools have also shifted and changed. There is no clarity on what we are trying to achieve across the system, or how we can achieve it. A pause is required.

Significant re-thinking is needed. The fundamental question of what is sustainable for schools and teachers needs to be at the heart of the re-think; it is as important as trying to unravel what the education system is for. Routine Ofsted inspections should cease until these questions have been answered.

Teacher 60152

30/10/2014 10:45:59 am

Ofsted is having a hard time of it lately, what with the Norfolk inspection scandal, and now the exposing of its data usage in RAISEOnline and the Data Dashboard as being seriously inept.

Questions that come to mind are:

1) Why did no-one at Ofsted realise the incorrect use of statistics?

2) If Ofsted knew their analyses were severely lacking, why did they use them and promote them so widely?

3) Why were people in schools interpreting this data in wholly incorrect ways over the entire country, and no-one noticed until now?

4) Will these practices be promptly stopped?
4b) If not, why not?

5) How many people have had careers ruined or damaged through misuse of statistics?

6) Should data only be trusted to an individual in a school who actually has some knowledge on how to use it PROPERLY?

7) How many people have made a good living talking complete bollocks about data in schools, and what will they do now?

8) Is anyone going to apologise?

Those last two are rhetorical. Sort of.

30/10/2014 11:07:11 am

Thanks very much for these comments, Teacher 60152. I must say that your questions have crossed my mind too. At least things seem out in the open now, and those in positions of authority are reading this, and will have to deal with these issues, one way or another...

Simon Hepburn link

1/11/2014 04:59:04 am

This is an excellent article that addresses that real elephants in the education room. However I would argue that we should reframe the role of inspectors further. They should check...

1) is the school safe for students and staff (the food hygiene inspection test - looking at behaviour as well as staff turnover)
2) are parents and students happy with the quality of education (satisfaction surveys / number of applications)
3) how much of a learning organisation is the school - does it innovate and learn (whether internally or externally) and does it have a long term plan for improvement.

Any school that scores poorly on these would then be given support, not punishment!

It’s time for Ofsted to inspect without test score data

Leave a Reply.

Author

Archives

Categories