Ofsted: Overseeing the Tyranny of Testing
?bench peer who advocates greater trust of public service professionals. Miliband said it was time for the government to step back. Where previously the Ofsted inspection system had been geared to policing every aspect of a school’s performance, including using lengthy lesson observations to judge teaching quality, now it had to become more focused. The key, he said, was for inspectors to look at the ‘outcomes’ achieved by schools for their pupils, rather than worrying too much about the methods they took to bring about any improvements in these end measures. He added that it was necessary to consider whether in?
Both of these statements dovetailed neatly with two key government priorities at the time. First, they fitted with the seemingly wise mantra of delivering more money to the front
?line of public service reform—in this case, the classroom itself—rather than to sup
Second, and more fundamentally, Miliband’s claims matched a drive across the civil service for it to focus not on micro
?managing how public sector institutions go about improving their provision, but simply to hold them to account for the results they achieve for those they serve. This also appears sensible. The great danger, many within government now argue, for a public sector which lacks the focus on the bottom line which characterises private firms, is that money is pumped into the system but wasted on bureaucracy, with little end product for the users of public services. As Matthew Taylor, former public services adviser to Tony Blair, wrote recently: ‘Poor performance [and] a loss of focus on outcomes are endemic vulnerabilities for big institutions, however laudable a system’s objectives and methods.’
In education, Miliband’s concerns have been translated into an inspection system which is much more focused on pupils’ test and exam results, as the ‘outcome’ measure for the schooling system, than it was in previous years. And it is my contention that this has been hugely damaging. It is helping to turn education even further towards a bleak and narrow vision that sees its defining purpose as being to maximise the next set of test scores. Yet the assumptions on which this rests are both simplistic and questionable, while the exam results data which now drive most inspections are often unreliable and vulnerable to manipulation.
The first question to consider, in evaluating the current inspection regime, is to what degree pupil test and exam outcomes now influence the verdict which each school receives from inspectors. That is, how much does children’s success in the national tests they must sit at seven, 11 and 14, and GCSEs, A-
Results have become much more significant after inspections changed in 2005, in line with Miliband’s proposals. In the 13 years from the introduction of Ofsted in 1992 to 2005, inspections followed a well
?worn pattern. Inspectors spent several days in a school, forming views of its quality by watching teaching, looking at pupils work, analysing their test and exam results and talking to staff and children, before writing up their judgement. Since 2005, the inspection scenario has changed dramatically. The process now starts with schools providing a pre?inspection report, which consists of their own analysis of their strengths and weaknesses. Inspectors then go into the school and spend a much shorter time than under the old regime—typically, in a secondary school, a day and a half—checking if the school’s verdict is correct. Crucially, before having done so, they will have conducted their own desk?based checks on the school’s qualities, in which the results of its pupils in national tests—the statutory assessments all children have to sit at age seven, 11 and 14—and exams—mainly GCSEs, A?
In early 2006, the Association of School and College Leaders, the secondary heads’ union, started reporting that inspectors were arriving at many schools having already made up their minds on what their verdict would be, based solely on the school’s test and exam result data. Ofsted, embarrassed that its regime of school visits might be seen as unnecessary, took action. It warned its inspectors in the spring of 2006 that results statistics, while ‘informing’, should not ‘determine’ their judgements. This remains its position.
Yet, two years on, the complaints from heads remain. One cheekily wrote: ‘This is no way to assess our pupils’,
and suggested that inspectors should simply short?
Thankfully, it is no longer necessary to consider only anecdotal evidence in checking the veracity of these claims. Ofsted itself now provides data on all of its inspection verdicts in recent years. An analysis of these judgements shows just how clear the link between a school’s test and exam results and its overall judgement is. Ofsted visited 6,331 primaries in 2006
?07, the last academic year for which results are available. Of these, 98 per cent had the same inspection verdict overall as they had for ‘achievement and standards’. This latter judgement is based on pupils’ test scores, and is only one of six main sub?headings within each inspection. The other sub?headings focus on children’s personal development; the quality of teaching; the curriculum; care and guidance offered to pupils; and the strength of the school’s leadership. Among secondary schools, the apparent link between exam results and the overall verdict was almost as strong, with 96 per cent gaining the same summing?
Ofsted now uses a four
?point judgement scale: outstanding provision is rated 1, and inadequate 4. In not one single school of the 7,612 visited that year did the overall judgement differ by more than a single grade from that given to a school on the basis of its results. Figures for 2005?06, the only other previous year on record since the introduction of the new Ofsted regime, suggest a similar link. Yet the statistics show that there is a far lower association between Ofsted’s verdict on other aspects of school life and the overall outcome. For example, only 41 per cent of primary schools received the same overall judgement, in 2006?
The emphasis of the inspection system on results statistics stands to be even further accentuated in future, with the promise that schools with good scores might go six years between inspections, while those where exam results are low will be visited every year. Indeed, Ofsted even admits the centrality of test and exam results to its inspectors’ overall verdicts on schools. When I put to Ofsted the strikingly high correlation between the judgement reached on test results and the outcome of inspections, the inspectorate replied: ‘We would expect these two grades to be the same, or very similar, in the vast majority of inspections. This is because achievement is arguably the most important of all the grades. Other aspects of the report—personal development… leadership and management—all contribute to how well learners achieve.’
In fact, while the current data
?driven inspection regime may fit an ideology which says public services are to be defined almost completely in terms of outcomes they achieve for those who use them, and be relatively cheap, it brings with it a host of problems. There are two aspects to this. The first could be characterised as the effect on schools’ behaviour of an inspection regime that puts such weight on improving exam scores. It is to accentuate test?
The introduction of school league tables in the early 1990s under the Conservatives, followed by New Labour’s launch of targets for school improvement and test
?orientated performance pay for teachers, mean that even without Ofsted in its current form, teachers would be very focused on improving test scores. English pupils face more centrally?monitored tests than their counterparts anywhere else. In most primaries, children encounter a government?designed test at the end of years two, three, four and five, before the major Sats hurdle: the Key Stage 2 tests in English, maths and science. In the four?month run?up to these tests in May, data from the Qualifications and Curriculum Authority reveal that schools spend nearly half the teaching week, on average, on test preparation. In the mean?time, non?tested subjects such as history, geography and music receive less curriculum time. Then, in secondary schools, pupils spend most of Year 9 preparing for Key Stage 3 tests in English, maths and science, before embarking on GCSE and A?level courses for which they can now expect final exams almost every term. In the coming two years, new modular GCSE courses which allow re?sits and examining to be staged over the two?year course and yet more tests—‘functional skills’ exams designed to respond to employers’ concerns about school?
Does this define a good education? Well, it is fair to say there are many who have doubts, not least the university admissions tutors who are presented with the products of this regime and who have, as a 2005 report by the Nuffield Foundation suggests, grave worries about the benefits of an exam
?driven system. The report, based on focus group work with 250 university representatives, said: ‘Narrow account?ability based on exam success… needs to be avoided. This leads to spoon?feeding rather than the fostering of independence and critical engagement with subject material.’
An inspection system which says, in effect, that school success depends on pupils’ scores through
The government argues throughout, in defending its system of school accountability of which inspections are a key strand, that it does not encourage profess
The second problematic aspect of the modern inspection regime is the question of whether the results that the tests and exams generate provide useful and reliable information about the quality of education which they are meant to assess. It is not always clear that good test results equal good teaching. There is, in fact, copious evidence that test scores can be boosted by short
?term test preparation or cramming—often repetitive practice of questions similar to those which are likely to appear in the forthcoming test—which does little for students’ long?
Ironically, some of the best evidence suggesting that the above question could be answered in the negative comes from Ofsted itself, in annual reports published before the introduction of the latest inspection regime. David Bell’s chief inspector’s report for 2004
?05 said, of Key Stage 3 English, for example: ‘In many schools, too much time is devoted to test revision, with not enough regard to how pupils’ skills could be developed in more meaningful ways.’
For maths, Ofsted concluded for the same year: ‘National test results continue to improve but this is as much due to better test technique as it is to a rise in standards of mathematical understanding.’ In science, a report for the Wellcome Trust this year, based on a survey of 600 teachers and focus group interviews with 74 of them, found that pupils were being turned off science by the two terms of revision they received in the run?up to the Key Stage 2 tests pupils take at 11.6 Yet, said focus group members ‘test preparation in its current form contri?buted little to pupils’ understanding’, while most teachers did not trust the test results as verdicts on their pupils’ underlying abilities, partly because of the hot?
The statistical formulae on which the Ofsted inspection framework sits can also be manipulated, so that the outcome may say more about a school’s ability to play the results ‘game’ than about the underlying quality of the service it provides for pupils.
Two examples best illustrate this. First, many schools have had to become adept at focusing on a narrow band of pupils, known widely as ‘borderliners’, who have the most potential to improve an insti
tution’s headline statistics. In primary schools, this is the group of children who are identified as being on the cusp of achieving the government benchmark of level four in the Key Stage 2 tests. In secondaries, those at risk of narrowly missing a level five in the Key Stage 3 tests, or a C grade at GCSE, are also the focus. Routinely, now, schools give these pupils extra attention in terms of after?
Second, secondary schools can choose to push their pupils towards GCSE
Parents’ views are also marginalised by a system which now rests so much on statistical representations of what constitutes a good school. Under the old arrangements, schools had to send out a parental questionnaire in advance of the inspection. Inspectors then collated the findings, published them in their report and, crucially, also explained their position when parents’ views differed from those of the inspect
In fact, there is little space for this in the new reports, which offer much sparser information on school quality than was possible before 2005. In my recent book on the test regime,
Education by Numbers, I compared two Ofsted secondary school reports from 2002, under the old regime, with two from 2006, under the new. The old reports weigh in at 50 and 61 pages respectively, against five pages each for their 2006 counterparts. In both 2006 reports, almost the entire summary on the school’s effectiveness—from the quality of the school’s curriculum to the pastoral care it provides—relates to test data. What is left unmeasured in the results statistics on which the new system rests? Well, extra?
If one accepts Ofsted’s justification of the new regime, however, this is not so. For all aspects of school life, it argues, contribute to pupils’ (test and exam) achievements. They are thus, indirectly, captured through test data, since a pupil given a rounded educational experience and who is enjoying his or her school life is more likely to succeed. This might sound a persuasive argument in theory. But the idea that every aspect of school life can be captured and measured through the statistical formulae of exam success, is, I would submit, simplistic and naïve. Neither would common sense suggest that every life
?enriching experience a pupil has at school will have an immediate pay?
Yet schools are being judged in this way. One primary head teacher, whose school failed its inspection in late 2005, put it this way: ‘In every section of the inspection report we were criticised for the same thing: standards (i.e. test scores). In “teaching and learning” the reason we got a 4 (the lowest category) was because standards were not good enough… The care and support we gave children was down because our academic support (as measured by test scores) was “inadequate”. And my “leadership and management” was down because the statistics were inadequate. In every section, we were damned because of poor test results.’
In fact, Ofsted’s argument fits the theoretical rationale which was used to justify the current structure of the inspection regime, rather than being based on the reality on the ground. The assumption is that all aspects of education contribute directly to immediate exam success, and that pupil outcomes matter more than the means used to achieve them. Yet I would argue that the means by which students achieve good grades are hugely important. A pupil who has managed to gain a particular level in a test at age 11 at the cost of a narrowed curriculum and months of repetitive question practice has not received an educational experience I would want for my children, if I were a parent.
Inputs, in terms of the quality of teaching as distinct from the ‘outcomes’ it generates for pupils, are important in this context. Education, I would contend, has value in itself, not just in terms of the immediate exam success it generates for pupils. Any inspection system, then, has to find a way of assessing the quality of teaching not simply through outcome statistics. An obvious way to do this would be to return to the old system of much more direct observation of lessons.
Test results can only ever be a proxy for good teaching. And they are a limited one, because they test only a proportion of the curriculum. For example, pupils’ speaking skills are not assessed in English exams until 16, while in science experimental work is not assessed in any government test until GCSE.
There is one final objection to the argument that Ofsted inspectors are right to base their verdicts to such a large extent on schools’ exam results. Although there might seem to be some logic in the notion that public services should be judged on their ability to ‘deliver’ better outcomes for those who use them, the generation of good exam results for pupils differs from other measures of public sector success. For, unlike, say, success rates of a surgeon on the operating table or the ability of companies to get their trains to run on time, the consumers themselves have a key role to play in the generation of good school results. Indeed, exams were originally conceived wholly as a way of assessing the qualities of the pupil, rather than his or her school. Pupil motivation and effort, then, have always been thought to be a key element in securing good marks.
In trying to make them, now, much more of a verdict on the quality of those educating the child, inspectors are underlining the view that improving test and exam scores almost have to be achieved for pupils come what may. In this way, student agency is down
A quotation from an academic in the Nuffield Foundation report also reflects the knock
When league tables were introduced, schools could at least take the view that they would not become ‘exams factories’ focusing relentlessly on test success. If results were slightly lower in consequence, at least parents could be the judge of whether or not the trade
?off was a price worth paying. Now, under the new Ofsted inspection regime, schools are facing a choice of going down the better?grades?at?all?cost route, or potentially being failed by inspectors impatient with any action which does not maximise pupil achieve