by Dr. Mark H. Shapiro
"A strong conviction that something must be done is the parent of many bad measures.".... ....Daniel Webster.
Commentary of the Day - May 5, 2007: The Assessment Sting. Guest commentary by Poor Elijah (Peter Berger)
Please note: Any resemblance between modern standardized test scores and actual achievement is purely coincidental.
Sometimes in a persuasive argument it's tactically shrewd to let your opinion sneak up on your audience. That way everybody seems to be objectively reaching the same conclusion together.
The frank declaration provided above should serve as my notice that I'm not employing that tactic here. I have a point of view when it comes to our present obsession with standardized testing, and you might as well know it in advance. You also ought to know that if you disagree with me, lots of powerful people, including President Bush, are on your side.
To avoid misunderstanding in another direction, I'm not the kind of teacher who hands out happy faces. My students take lots of quizzes and tests, they write lots of essays, and I hand out lots of letter grades. These exercises give them practice explaining themselves and supporting their ideas, a noble sounding and important educational goal, but they also demonstrate what my students know and what skills they possess, a worthwhile objective that for a few decades has regrettably been less fashionable among reform educators and experts. Philosophical fashion aside, it's important for me so I know how fast to go and what to teach next, and it's important for everybody else so they know how well a particular student is doing over the course of a term.
Since all teachers don't maintain identical standards for A's and F's, broadly administered standardized tests can help compare students in different schools and states, provided the students in these different schools and states have been taught the same things in time for the same tests. If a school doesn’t teach dividing fractions until fifth grade, a standardized fourth grade test that includes that skill would inappropriately rate that school as deficient. Similarly, an April question on the New Deal would leave eighth graders drawing a blank if they don't study FDR until May. This was the case on a standardized test my school administered for years.
Timing isn’t the only problem. Federal officials recently panned eighth graders' knowledge of American history when "fewer than a third could completely describe the steel plow's historical role in improved farming." I teach a comprehensive, traditional survey of U.S. history. I'm alarmed at many students' ignorance of their past, too, but the steel plow isn't a topic I spend much time on. I suspect I'm not the only history teacher out there who could come up with a more essential essay question for a national assessment. Multiple disagreements like this will always plague standardized tests, as well as proposals for a national curriculum.
These flaws were less of a problem before No Child Left Behind when we had more modest expectations for standardized assessments. NCLB mandates that schools make "adequate yearly progress" toward the impossible goal that every student be academically proficient by 2014. That progress is supposed to be determined through a bureaucratically nightmarish schedule of assessments, unsynchronizable deadlines, and arbitrary sanctions that threaten both funding and local control of schools. The result has been the loss of irreplaceable classroom hours, literally days and weeks of instructional time diverted to testing, and a torrent of unreliable data.
These penalties have school officials in a cold sweat, and not necessarily because their schools are bad. You don't have to be a bad school to fail. According to a senior RAND analyst, the current assessment regime doesn’t identify "good schools" and "bad schools." Instead, "we’re picking out lucky and unlucky schools." A Brookings Institution study found that "fifty to eighty percent of the improvement in a school’s average test scores from one year to the next was temporary" and "caused by fluctuations that had nothing to do with long-term changes in learning or productivity."
So much for calculating adequate yearly progress.
Part of the problem is students commonly don't care about standardized tests. Some randomly color in the answer bubbles, and I've scanned enough of my own students' responses to know that many don’t take standardized essay questions as seriously as they take those on my history tests. The federal government agrees. In a brilliant, tax-funded burst of insight, officials discovered that "students who reported it was important to do well…scored higher than students who reported it was not very important."
Even when students make their best effort, the scoring process chronically fouls things up. Modern assessments are frequently scored subjectively. The NECAP test, used in Vermont, New Hampshire, and Rhode Island, requires often scantily trained scorers to distinguish between writing that has a "general purpose" and an "evident purpose," or a "strong focus" and a focus that's "maintained throughout." Is the essay "intentionally organized," or "well-organized and coherent"? You decide, and then tell me your score is objective and meaningful.
Unreliable scoring is one reason Congress's General Accountability Office described data "comparisons between states" as "meaningless." It's why CTB/McGraw-Hill had to recall and rescore 120,000 Connecticut writing tests. It's why New York discarded its 2003 Regents math exam scores. A few years back North Carolina was forced to recalculate its "faulty" statewide scores when test designers decided they'd "set the passing scores too low." Around the same time here in Vermont we "revised" our scores when they "seemed higher" than they should have been, which gives you some idea how arbitrary "standardized" testing can be. Not to worry, though. Our scores were only off by twenty percent. At least, we think they were only off by twenty percent.
A 2006 think tank survey of twenty-three states found that thirty-five percent of school districts had experienced "significant" scoring errors. Illinois statewide achievement tests themselves were riddled with thirteen errors. In Texas over four thousand 2003 sophomores were incorrectly retained. Students in a third of Connecticut's districts "got the wrong scores" in 2005, while Alabama incorrectly passed and failed entire districts.
The GAO reported that a majority of state and district officials sampled nationwide experienced problems with "unreliable student data." In 2003 the National Board on Educational Testing and Public Policy documented a dramatic escalation of "undetected human error." A Boston College study found that the frequency of scoring "blunders" has "risen dramatically," and that "the future is ripe for the proliferation of human error." According to a 2006 Education Sector article, the problems and errors are "likely to get worse."
The price tag for this debacle is billions, but the cost in classroom time and school resources is even dearer. Schools and students need to be held accountable, but not with an assessment system that can't account for itself.
© 2007, Peter Berger.
Peter Berger teaches English in Weathersfield, Vermont. Poor Elijah would be pleased to answer messages addressed to him in care of the editor.
The IP responds: Like Poor Elijah, the IP is of the opinion that high-stakes, standardized tests are of limited value in determining what an individual student has accomplished or how effective an individual teacher might be. Standardized tests are of value in determining on a statistical level the effectiveness of our education efforts over the long haul. The National Assessment of Educational Progress (NAEP) is one example of a worthwhile testing program, in the IP's opinion. But even these tests have their problems and limitations. The current emphasis on using standardized tests to measure the progress of individual students is bound to fail for all the reasons that Poor Elijah raises. Additionally, it is having the effect of nationalizing curriculum in a very narrow fashion that seems to be driving many teachers from the profession.