Reflections on the final report of Lord Bew’s KS2 assessment inquiry

Sunday, July 3rd, 2011

I should begin this blog post with a note of slight regret. It gives me no pleasure to be writing something which is critical of the Bew report, especially given the courtesy with which Lord Bew treated me in giving evidence to the review. He invited me to do so, and even wrote me a handwritten note to thank me afterwards.  The review’s interim report, published in April, was, I thought, a largely impressive synthesis of evidence on this subject which gave me hope that, whatever the outcome and whatever the constraints of the remit, the issues would be given a thorough and fair weighing in the final report.

Yet, I am afraid, despite some impressive passages, the report really does not do justice to this, I think, incredibly important subject.

I say this mainly for three reasons.

First, the report fails to follow through on what is said, at least in the foreword to the report, to be the first priority for the assessment and accountability system: ensuring that such a system supports children’s learning. Second, it misrepresents the evidential position on the effects of test-based accountability in a fundamental way. And third:  it does not address in any meaningful sense a central criticism of test-based accountability: that test results are being used for too many purposes and that key purposes can be at odds with one another (my italics, since this was the bit that was not meaningfully considered).

To deal with the first problem, Lord Bew says in the report’s foreword:

“We would like to be quite clear that throughout this process we have always focused on how best to support the learning of each individual child.”

If this had been the overall goal of the review, I would say “fantastic”. The trouble is, having set this up as an aim in the foreword, this approach is completely absent in the report, where the quality of the learning experience resulting from accountability – what, if anything, is happening in lessons as a result of test-driven accountability? – really gets only glancing consideration.

This becomes clearer when we look at the report’s consideration of evidence.

The report says: “Strong evidence shows that external school-level accountability is important in driving up standards and pupils’ attainment and progress. The OECD has concluded that a ‘high stakes’ accountability system can raise pupil achievement in general and not just in those areas under scrutiny.”

Well, I wrote in detail here about the OECD evidence on which Bew drew for this statement.

I do not think anyone reading that report in full could believe that it provides a ringing endorsement of an “English”-style accountability system. Consider, as I mentioned in that blog, the fact that that OECD report says: “Across school systems, there is no measurable relationship between [the] variable uses of assessment for accountability purposes and the performance of school systems.”

Moreover, although Bew says “the OECD has concluded that a ‘high stakes’ accountability system can raise pupil achievement”, with “high stakes” in quotation marks, in fact the phrase “high stakes” only occurs once in the main text of the 308-page OECD report which Bew references here, and its use does not back up the claim made here. (“High stakes” in the one instance referenced in this report refers to any qualification which is high stakes for a pupil, by which criterion the A-levels I took in the 1980s – which were low stakes for my school – would count but today’s Sats would not).

As I wrote in an article for the TES based on research for the NAHT, in fact there are many education systems which are not doing a demonstrably worse job than England and which do not have “high-stakes” accountability of the English kind.

If Bew’s claim is that this type of accountability “is important in driving up standards and pupils’ attainment and progress” is to be understood as meaning that it improves education in a more general sense than simply improving test scores, which must at least be considered if the quality of pupils’ learning is really what matters, then the report needs to consider more evidence.

Yet this section of the report, entitled “the impact of school accountability”, includes no studies raising concerns on the issue of test-driven schooling. It highlights only research which supports it.

This section then simply ends: “We believe the evidence that external school-level accountability drives up pupils’ attainment and progress is compelling.”

This is an absolute travesty of the evidential position. I would say that, given that I wrote my book on this subject from 2005 to 2007 seeking to put together all the evidence I could find on the effects of this system. Negative effects were not hard to come across: detailed concerns about the side-effects were coming to me naturally virtually every week around that time in my work at the Times Educational Supplement. To repeat, none of this evidence gets a mention in the section of the report where Bew is deciding whether or not high-stakes accountability is a good thing.

That is a shocking indictment of this final report. For all the evidence commented on in the interim report, it undermines any claim that this subject has been considered in a truly open-minded way.

If the evidence had been considered, weighed and a conclusion reached that the claimed advantages of hyper-accountability outweighed the claimed negatives (taken seriously and considered in detail); or if a conclusion had been reached that the current system, though imperfect should be retained because changing it in a fundamental way would present too many difficulties, well at least that would have been more honest. To try to claim that the evidence points entirely in this single direction is simply wrong.

Other inquiries to have raised deep concerns about test-driven schooling in recent years have been the Children, Schools and Families assessment investigation of 2007-8, plus its subsequent probe into the national curriculum; the Children’s Society’s Good Childhood Inquiry; and the exhaustive Cambridge Primary Review.  Sir Jim Rose, in conducting his own national curriculum inquiry for Labour which was barred from considering assessment, described it as the “elephant in the room”, in terms of the impact on the curriculum.

Consider some of the claims made in evidence to these various reviews.

The Mathematical Association told the select committee inquiry: “Coaching for the test, now occupying inflated teaching time and effort in almost all schools for which we have information at each Key Stage, is not constructive: short term ‘teaching how to’ is no substitute for long-term teaching of understanding and relationship within and beyond mathematics as part of a broad and balanced curriculum.”

The Cambridge Primary Review reported one witness to the review as, citing her experience as an English teacher, primary head and English examiner, as condemning “the ‘abject state of affairs'” where reading for pleasure in schools “has disappeared under the pressure to pass tests”.

The Independent Schools Council told the select committee’s curriculum inquiry: “National curriculum assessment should not entail excessive testing. Universally, a focus on testing was found to narrow children’s learning, teachers’ autonomy and children’s engagement in learning.”

Ofsted also told the select committee that “In some schools an emphasis on tests in English, mathematics and science limits the range of work in these subjects in particular year groups.” An Ofsted report on primary geography from January 2008, found that “pupils in many schools study little geography until the statutory tests are finished”, while an Ofsted report on music said “A major concern was the amount of time given to music. There were examples of music ceasing during Year 6 to provide more time for English and mathematics.”

The OECD itself said, in the education section of its report on the UK in March this year that: “Transparent and accurate benchmarking procedures are crucial for measuring student and school performance, but “high–stake” tests can produce perverse incentives. The extensive reliance on National Curriculum Tests and General Certificate of Secondary Education (GCSE) scores for evaluating the performance of students, schools and the school system raises several concerns. Evidence suggests that improvement in exam grades is out of line with independent indicators of performance, suggesting grade inflation could be a significant factor. Furthermore, the focus on test scores incentivises “teaching to tests” and strategic behaviour and could lead to negligence of non-cognitive skill formation”

 Either Bew has, then, defined “attainment and progress” in such a narrow sense – ie it means “there is compelling evidence that test-driven accountability drives up test scores” – that its claim to be interested in the learning of each child more generally cannot bear scrutiny (since it is only interested in the evidence of test scores).

Or improving “attainment and progress” is meant to stand for the quality of education as a whole improving as a result of “high-stakes” test-based accountability, in which case Bew has simply chosen to ignore that section of the research on this subject which conflicts with the way the review was framed by the government.

The report does, then, move on to “concerns over the school accountability system”, including “teaching to the test”. But it offers no detail of what the evidence says as to what this might mean for the pupil. The only substantial concern acknowledged here is the unfairness of the way results indicators are used for schools, which it says its recommendations will go on to tackle. This is an important argument, of course, but it is not the same as the claim, widely made, that the system of test-based accountability damages the learning experience of at least a proportion of pupils.

The only acknowledgement of this claim here is when the report says that many heads feel they “‘need’ to concentrate much of Year 6 teaching on preparation for National Curriculum Tests in order to prevent results dropping”. Bew then acknowledges that “the accountability system to date may appear to have encouraged this behaviour [my incredulous italics at the weakness of ‘may’, when heads face losing their jobs if results fall]”.

The report reacts by simply arguing that this need not happen: schools can get good results without narrowing the curriculum. That is exactly the conclusion of the last major report to look at this subject: the 2008 “expert group” report on assessment for Ed Balls as schools secretary.  That report suggested running a campaign to persuade teachers not to teach to the test, since there was simply no need.

Although teachers have argued with me that a good professional does not need to teach to the test, I’m afraid I think of this, when I read it in official reports, as the ostrich, or head-in-the-sand position. It is unscientific, I believe: the fact that some teachers and schools buck the trend does not negate the existence of the trend. The National Strategies, in the past have encouraged teaching to the test, so presumably they thought there was some value in it for schools, in terms of improving results. I suspect local authorities have also promoted a great focus on the content of the tests in schools where the data just has to improve. Overall, the incentives of the accountability system certainly push at least a proportion of schools towards test-driven teaching and thus, if one truly wanted to change this, it would be a good idea to look at changing the way accountability works, rather than effectively simply telling teachers not to follow what for many of them will be its logic.

Then the report closes down the debate, saying simply: “Given the importance of external school-level accountability, we believe publishing data and being transparent about school performance is the right approach.”

In other words, because the review team had already decided that the evidence of the beneficial effects of external accountability was “compelling” – ie without presenting any research on negative impacts – that was the end of the matter. There was no consideration of the actual impact on children’s learning during test preparation, and the nature of it.

Incidentally, because the review team believes that “high-stakes” accountability – ie making results high stakes for schools – works, it must then also believe that assessment should drive what goes on in schools, since the philosophy must be that making assessment results “high-stakes” for schools forces them to improve the quality of education they provide.  

The third problem of the report is related to this, and I don’t want to use too much space going into it in detail here. But in essence it runs as follows. Bew really ducks another criticism of test-based accountability: that test results are used for too many purposes, and that because of this, testing as currently constituted serves many of these purposes less than well.

I’ve put the second bit in italics, because Bew really doesn’t consider this implication. Essentially, Bew accepts the widespread claim that assessment data are put to very many purposes, but reacts to this mainly by listing the “principal” purposes to which they are already put, and then saying other uses should be considered as “secondary”.

It is, I suppose, at least an attempt to consider this issue. But the problem is that the purposes suggested as central by Bew include both that data should be used to hold schools to account, and to provide good information on the progress being made by individual pupils, for the benefit of those pupils and their parents. Bew’s claim, in the foreword, that test-based accountability should also support children’s learning should also be borne in mind here, for that must be another guiding principle if taken at face value.

The problem with the report is that arguably the argument at the heart of this debate is that the use of data to provide information on a school – and on teachers’ – performance can conflict with its use both to support pupils’ learning and to provide the best possible information on the quality of that learning.

This is a big part of what the many people who, Bew acknowledges, submitted evidence to the review mean when they say that the problem is not the tests, it is the league tables which are constructed on the back of them. Because teachers are worried about their school’s results, they take actions which, while right in terms of boosting results, may not be supporting the best learning experience for the child, or their long-term educational interests. And the very act of teachers directing so much attention at the tests and results indicators may also, paradoxically perhaps, make them less good measures of underlying education quality, an argument implicitly acknowledged in the report in a section where it says many secondary teachers do not trust KS2 Sats results because of the extent to which pupils have been prepared for the tests.

In other words, the purposes – and even these “principal” purposes – are in conflict. A report which took seriously the washback effects on learning, from the child’s point of view, of the accountability system, would look much more closely at each of these aims to try to ensure that the requirements of accountability do not conflict with the aim of providing the best possible education experience for pupils.

Some alternative proposals, not backed by Bew, have tried to look at re-engineering aspects of the system to stop some of the purposes conflicting in ways which look either harmful for pupils, or which give us less good data than we might want.

For example, the suggestion put forward by many that national education standards could better be monitored through a system of assessing a sample of pupils rather than through testing every child comes because the purposes to which the current testing system is put are felt to be in conflict. A sampling system, with a relatively small number of pupils being assessed and each on differing parts of the curriculum, would allow information to be collected, potentially, across a much wider and deeper spread of aspects of the curriculum than is possible through a system where all pupils must take every test produced. And its information on whether standards were improving or falling would be more robust because, as the results would be “low-stakes” for schools, test questions could be retained from year to year to allow direct comparisons of pupil performance to be made.

These kind of improvements on the quality of information provided are not possible in the current system because other purposes to which current national test data is put – to provide information on individual schools and on all pupils’ performance, meaning that every pupil must be tested, and papers must change from year to year to guard against schools “cheating” – make them unfeasible.

A more serious look at this subject would also have considered in detail the problems of seeking simultaneously to use test results as “objective” measures of pupil performance;  to support learning; and also to hold schools to account. In 2006, a proposal put forward  by Cambridge Assessment and the Institute for Public Policy Research acknowledged the problem that the purposes were in conflict: the need for schools to generate good results could lead to test-driven teaching and a narrowed curriculum, which was not an ideal form of learning. It therefore proposed a change whereby teacher assessment would become the main judgement on both pupils’ and schools’ performance, but then children in each school assessed through a “testlet”, measuring for each child just a small area of the curriculum. The testlet results would be used as an assurance that the accountability function now placed on teacher assessment was not leading schools to inflate their results. In other words, it retained accountability but, in trying to change the relationship with tests in a small number of subjects, attempted to stop it conflicting with the goal of supporting good learning. This idea was not considered in detail by the report.*

Another alternative, mentioned as my favourite in my book, would be to make inspection judgements the central focus of school-by-school accountability (with inspections offering a rounded look at the quality of education provided, to guard against curriculum narrowing), and to run sample tests to help provide national education quality information.

Instead of trying to look at the relationship between the purposes, Bew has simply left the mechanics of the system in place, in that assessment data is still to be used for all the main purposes it is now including : holding schools to account, producing data on individual pupils’ performance for the benefit of them and their parents, and generating national and regional achievement data.

The report says that through its proposals “we believe we can address the imbalances and perverse incentives in the school accountability system”.

Because the review has not addressed the issue of the conflict of purposes this idea of countering perverse incentives is, I think, a forlorn hope. Its proposals represent no significant change to the system’s fundamentals, but rather a restating of the basis of the system – (which the report must implicitly believe, in its essentials, to be a good thing) – and then an attempt to manage the detail.

Ok, so now, finally to turn to the concrete stuff in terms of those detailed changes recommended by the report, some of which, I think, are important.

–          The report proposes moving to a system of publishing schools’ results averaged over a three year period, to address concerns that judging institutions on single years is unfair, given the way pupil cohorts can change. Small schools, where the introduction of a few high- or low-achieving pupils can have a proportionally very large effect on results from year to year are particularly hard hit by the current system, and their concerns would seem to have influenced this change. However, three-year averages are not recommended to replace single year statistics, but to sit alongside them in league tables. A key consideration could be what weight they are given elsewhere in the accountability regime, including Ofsted reports and floor targets; the report does not, I think, stipulate that they should be given priority.

–          Additional measures are to be introduced recording schools’ achievements counting only those pupils who completed the whole of years 5 and 6 at the school, in response to concerns that schools with lots of children arriving from elsewhere feel an effect on their results. Again, it seems these results will be published alongside the existing measures, rather than replacing them.

–          The report talks about placing a greater emphasis on progress measures, alongside “raw” attainment. However, progress measures already feature in league tables, are central to Ofsted’s new systems and are included in the government’s new floor targets for primaries. So call me a cynic but it is hard to see that the report has added much here. (Overall, my hunch is that there is very little in the report as a whole with which the government would disagree – and you have to wonder after reading this report if this was always likely to be the outcome – but one test (pardon the pun) of that will have to await ministers’ reaction to the report).

–          Teachers will submit teacher assessment judgements before pupils’ test results are known. This seems sensible to me, as it negates the risk of the test judgement influencing the teacher assessment verdict. As the report correctly states, they are measuring different things, so the judgements reached through each assessment method should be kept separate.

–          Finally, the most significant change relates to writing. Bew proposes, first, the introduction of a new test of spelling, punctuation, grammar and vocabulary. I guess teachers will have views on that; I would not comment except to say that the comment in the report that these aspects of English have “right” and “wrong” answers was something some people were querying last week.

The recommendation, however, to replace the writing test with teacher assessment is substantial. It has always seemed to me strange, as someone who went through secondary and university assessments in the 1980s and 1990s and was never assessed on creative writing in the exam hall, it has always seemed to me to be strange that 11-year-olds were asked to be creative under the time pressure of Sats. I think a move to teacher assessment, then, would undoubtedly be a good thing. It could be argued that this change alone, in promoting a better assessment experience for many children, will mean the Bew review will have been worthwhile, despite some of its more fundamental findings being so flawed.

The report does, however, mention that the teacher assessment results are to be subject to external moderation. This is unavoidable, in a system which is using the scores generated to hold schools to account. Ministers, I am guessing, will want to ensure that the moderation is robust, as clearly there will be an incentive for schools to push up scores if they were under pressure over pupil achievement through, for example, the floor standards. The great danger, again, would be that the government decided that the need to use the results to judge schools is seen to be more important than providing the right assessment experience for pupils – that conflict of purposes again –  and therefore moved not to accept this recommendation to move towards teacher assessment. I have, though, no evidence that this is going to happen and hope it will find favour.

Summing up, Bew’s detailed changes do stand to make some difference. But I would suggest that the arguments over the system’s underlying dysfunctionality – or not – are not going to go away. It is a shame that this report did not take more seriously, in reaching its verdict in this final report, the detail and nature of some of the concerns.

*The report does briefly the merits of using tests to moderate a mainly teacher assessment system, concluding that this would not be feasible as tests and teacher assessment are not the same and thus, I think is the implication, it would be wrong to view the test as providing “true” validation of each teacher assessment verdict. I would not disagree with that as an argument, but I do not think it invalidates the Cambridge Assessment/IPPR model, since the “testlets” in this case are not meant to provide a judgement on the accuracy of teacher assessment in the case of every pupil, but merely to provide a more general check that a school has not inflated its teacher assessment judgements.

1 Comment

  1. The government’s mandated position is that it is only through choice and competition (and the creation of a market) that improvement in public services of all kinds can happen.

    You can follow the same logic within health, higher education and now quite clearly in schools. A rejection of tables and testing would mean great difficulties in the ability for consumers to ‘choose’.

    Evidence therefore gives way to mandated beliefs.

    Choose theory is a very large field, mostly based in economics. The theories are significantly flawed but they will be forced to fit the political narrative.

Leave a Reply

Your email address will not be published. Required fields are marked *