Skip to main content
Full Menu
News Stories

Standardized tests have a purpose — just one

New York Teacher


The 2002 No Child Left Behind law, which mandated annual testing in grades 3–8, was riddled with unintended consequences. The major one allowed what were essentially low-quality, off-the-shelf commercial tests to drive instruction in U.S. public schools.

A consensus is emerging across the country that testing has skidded off the rails. Using these limited instruments, good schools have been labeled failing, skilled teachers have been called ineffective, and millions of students have been subject to scoring metrics that fluctuate wildly and inexplicably.

Yet since the federal law’s passage 13 years ago, these tests have assumed unprecedented importance. “The shift from using tests for information to holding students or educators directly accountable for scores is beyond a doubt the most important change in testing in the past half century,” wrote testing expert Daniel Koretz in his acclaimed book “Measuring Up: What Educational Testing Really Tells Us.”

Standardized tests have a place in education. As Koretz explains, the tasks, the administration and the scoring of standardized tests are uniform, so they allow for comparisons across students, schools and districts. What’s more, once results are broken out for subgroups of test takers, as NCLB mandates, they can shine a bright light on historically neglected students.

When African-American students were shown to be performing 30 percentage points lower on average than white students, or when only a tiny fraction of special education students passed the tests, U.S. education was forced to acknowledge the glaring inequalities. Through the law, too, taxpayers know how schools are educating children against clear, benchmarked standards.

Straying from their mission

But using standardized tests to make broad judgments about teachers, schools or school districts strays far from what these tests were built to accomplish. Standardized tests are “summative,” designed to see if students reached a specific performance level on a specific body of knowledge at the end of a course or year. Like on a licensing exam or an entrance test, the test taker is either over the bar or under it on a predetermined scale.

Standardized tests are very different from “formative” tests, such as classroom quizzes or papers, which teachers use to gauge student learning and identify areas of confusion. Formative tests, for which results are immediate, are diagnostic teaching tools that are used to inform instruction.

“We have some problems with these large-scale, summative tests,” said Scott Marion, the executive director for the National Center for the Improvement of Educational Assessment. “We keep layering more and more purposes on them.”

As social scientist Donald Campbell’s famous law [see box] states, this can corrupt their mission.

Assessment Types
  • Summative
  • End of course
  • Evaluative
  • Used for accountability
  • Formative
  • During course
  • Diagnostic
  • Used for instruction

“When the cook tastes the soup, that’s formative assessment. When the customer tastes the soup, that’s summative assessment.” — assessment expert Paul Black

Test designers, or “psychometricians,” like Marion start by asking states exactly what they want a test to measure — its “theory of action.” Even using the new Common Core standards, states are “still requiring pretty intense testing without any theory of action,” Marion said.

What skills and knowledge are they choosing to assess? What level of mastery should students be able to demonstrate? Under what time limits? “If you as a state or district have no reason for one versus the other, you have to go back to the drawing board,” Marion said.

Campbell's Law

“The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”

A tipping point

Standardized tests are not designed to measure growth. They can do a decent job of it, said Marion, if the tests are closely matched over two years and if there is enough data — student scores — to reduce measurement error. But inferring a growth score has risks, Marion said.

Often tests don’t offer enough data to make valid judgments and some scores are “imputed,” meaning statistically guessed at. “There are a lot of schools where there is already a lot of missing data,” said Sean Corcoran, an education researcher at NYU. “We’ve been willing to accept that as a shortcoming of growth measures, although there comes a point where you just don’t believe the numbers,” he said. “I’m not sure what the tipping point is there.”

Leaders of both political parties now acknowledge the errors of high-pressure NCLB-era testing. The U.S. Senate education committee just passed — unanimously — a revision of NCLB that maintains annual testing as a way to track how effectively schools are educating poor and minority students, but eliminates the most punitive consequences for schools and teachers.

If the legislation passes the full Congress later this year, states will have the green light to put standardized tests back in their original place. And, advocates hope, we’ll see a dwindling of such terrible effects of the prior law as narrowing curriculum, cheating and excessive test prep. Educators would be able to focus on genuine teaching and learning.

Related Topics: Testing