Testing a lousy way to hold schools accountable
The problem with a lie--even one that everyone agrees with--is that eventually you can't ignore the truth. Enron can't paper over debt and crashes. The housing bubble pops. Now, as the rush toward using standardized test scores to evaluate teachers turns into a retreat, it might be time to face that standardized tests are a lousy way to hold schools accountable.
We've been using bubble tests to hold schools and students accountable for a long time, mostly without anyone asking tough questions about whether the scores were valid measures. Controversy over student testing was slow to develop and then mostly concerned the number of tests and the harsh consequences. We never asked whether the thermometer really measured the temperature, even though our education system is based upon the validity of these tests.
That's why it's a shock to see the sudden blowback against using these test scores to rate teachers. The snake oil in question is what advocates call the Value-Added Method. They assume, without being sure about the details, that you can use student test scores to measure a teacher's effectiveness, allowing schools to get rid of the dead wood some assume is holding American kids back.
Education reformers have been trying this for years. Michelle Rhee, the former chancellor of the DC public school system in 2007-10, used VAM to rate not only teachers but custodial staff as well. Bill Gates gave a Florida county $100 million to start rating teachers with test scores, and some teachers who taught non-tested subjects were rated with scores of students they didn't have.
But again, criticism of VAM focused on the excesses of governmental stupidity, the Orwell-meets-Kafka quality of it all. To most people, it still made sense. At least it did to Education Secretary Arne Duncan, who wanted everyone to use it. Normally, states run education policy, but No Child Left Behind contained a bizarre, Lake Wobegon requirement that every single child in America should test on grade level by 2014. Duncan would let states off that hook on the condition that they implement VAM.
This is when statisticians pointed out, in language that probably sounded stronger in a faculty dining room, that there was just one, tiny problem with VAM: It was junk science. Using student test scores to measure what effect a teacher had in a classroom was like using body fat percentage to pick a Super Bowl winner.
Of course, they used tweedy phrases, saying that "such estimates are far too unstable to be considered fair or reliable" (Board on Testing and Assessment of the National Research Council of the National Academy of Sciences) and that VAM had "too many pitfalls to making causal attributions of teacher effectiveness" (Educational Testing Service's Policy Information Center) and were "too imprecise to support some of the desired inferences" (Rand Corp.).
What really seemed to shake things up was an April report by the American Statistical Association, which said that because VAMs were based only on standardized tests that they were 10 pounds of hooey in a 5-pound bag. And if you're inclined to want the details, here's the phrase that pays: "Most VAM studies find that teachers account for about 1 to 14 percent of the variability in test scores."
States began rebelling, saying they couldn't make the unworkable work. Duncan withdrew Washington state's waiver when they threw up their hands, but now even the Gates Foundation, once a major driving force behind VAM, wants to stop using tests to rate teachers and students for at least two years, and Texas, where all this testing madness started, wants to wait a year before implementing VAM. Apparently it's not easy doing the impossible.
Without question, teachers unions played a big role in stopping VAM. It kind of makes you wish that the people elected to represent students had asked similar questions before imposing high-stakes testing on our schools. If they're inclined to start asking questions, here's one: If teachers only account for 1 to 14 percent of the change in test scores, then what does the other 86 percent measure?
And if we don't know what it means, why are we holding schools, students, and teachers accountable to it?
Â© 2014 Jason Stanford