Report of the UFT Task Force on High Stakes Testing April 2007
Apr 20, 2007 2:26 PM
RECOMMENDATIONS
Based on readings, presentations, public comments and our own discussions the task force offers the following recommendations.
1. New York City must stop the compulsory administration of standardized tests every six-weeks.
These mandated, often expensive, packaged tests are duplicative, not connected to what is happening in the classroom on a daily basis, and steal time from instruction. The requirement for mandatory six-week testing can only increase the perception that the role of teachers is not to teach but to test and collect data. Teachers at the forums and elsewhere have said that they learn little about students from these packaged tests that they would not have known otherwise from their own evaluations and assessments. The interim tests the DOE requires teachers to use do not always match the subject matter or skills being taught and the results often come in weeks or months after the tests are given.
Additionally, mandated interim assessments administered at the same time to all students do not necessarily provide reliable data for all students and, at times may be unnecessary. These interim assessments generate excessive paperwork and, as we have begun to see with ECLAS and Princeton Review, diagnostic assessments are being used not only to assess but also to make high stakes decisions about students' futures, a use for which these assessments were certainly never intended, and a use for which there is no evidence of validity. Although DOE initially stated these interim assessments were solely for internal use for the improvement of instruction, they now plan to include the results of these interim assessments in their newly developed $80 million dollar ARIS data collection and tracking system that becomes operative in September 2007.
2. Tests and assessments are diagnostic and instructional tools. They must not be the sole determinants for student placement, promotion, graduation and other high stakes judgments.
High stakes judgments for schools and students, as mandated by city, state and federal requirements must be based on multiple forms of evidence, not only standardized tests. Inferences and decisions that educators draw from test results must be only those that the test was designed to and truly measures. Teachers should be given access to the test publisher’s data on validity, reliability and the basis for the normative comparison or how criteria for passing were set. Professional development activities in schools should include discussions of the meaning of this material and how teachers can explain scores to parents and students.
Educators know that students develop and learn in different ways and at different rates. In order to assess student learning and make decisions about what’s best for a child’s long term academic success and emotional development, it is necessary to use a variety of means to measure ongoing student progress and performance. Especially important is measuring that which occurs on a daily basis in the classroom, in areas such as reading, writing, speaking, problem solving and critical inquiry. These are skills that are necessary for success beyond high school and are not necessarily measured on a single standardized test or a mandated interim assessment. The state and city should base student promotion on multiple indicators such as grade point averages, performance assessments, portfolios, teacher comments, attendance and test scores.
3. Do not use student test scores to evaluate teachers.
The use of student test scores as a determinant of teacher quality has a simplistic appeal--students who score high on standardized tests must have received instruction from qualified teachers and those who score low obviously did not receive good instruction. To paraphrase H.L. Mencken, most simple solutions are also the wrong solution. The use of data from student test scores on standardized tests to evaluate teachers may appear simple, be intuitively appealing, but it is wrong.
Student scores on standardized tests provide a sample of student performance on a single exam but do not provide other important measurements of student achievement. Standardized tests do not isolate the other factors that may affect student achievement such as class size, the ratio of adults to students, quality of facilities, availability of resources, poverty, attendance patterns, parental involvement, facility with English, and prior schooling experience or an upset stomach on testing day. The importance of these other factors in student achievement was noted by a Chicago Public School principal, Barbara Williams, in a March 6, 2007 Chicago Tribune article, "City grade schools shine on tests." She was "devastated" by the drop in reading scores in her 5th grade, the cause of which she said was "one classroom packed with 33 children and serious behavior problems."
Teachers, principals and those familiar with tests and the science of psychometrics, a number of whom spoke at the UFT's forums on high stakes testing, pointed out that the questions on these standardized tests are frequently highly correlated with socioeconomic status. The questions which really measure the students' prior knowledge and experience, what they bring to school, provide the best way to ensure a “proper” distribution of scores, and accounts for the subtle bias that many people find in these tests. The test scores, then, do not measure only what happened as a result of instruction but also measure a host of other variables. These scores are not valid measures of teacher performance.
Experts have also pointed to the lack of alignment between standards, curriculum and assessments. The tests and assessments New York City's students take often have no relation to what teachers are teaching. Currently, given the misalignment among standards, curriculum and assessments, the solution to raising student test scores has been the questionable instructional technique of "teaching to the test." Intensive test preparation results in a superficial covering of material and, in many cases, ultimately becomes curriculum and instruction. The noted historian and former Undersecretary of Education Diane Ravitch is one of many educators who have shown how the emphasis on standardized testing narrows and waters down the quality and the quantity of school curriculum, and, as mentioned in the forums, eliminates entire subject areas.
Using student test scores to evaluate teachers would exacerbate this practice. A punitive evaluation system, based on a single test, would place even more stress on teachers to raise student test scores. Although teachers at all levels are feeling this pressure, teachers and students in the elementary schools are especially sensitive to this as they do not specialize in one subject and are responsible for preparing students for high stakes tests in more than one area.
The increased weight that test scores carry can channel time and money away from other areas of importance. In order to maximize the for test score increases, instruction in subjects that are not tested—the arts, foreign languages, physical education—decreases. Even if we accept the premise that the use of student test scores is defensible, how do you evaluate teachers in those subjects where there is no test? Adding more tests is not an answer. Should we reduce, for example, arts education merely to knowing on a multiple choice test who painted Guernica or who composed Tosca, or should students learn to understand and appreciate a variety of forms of artistic expression and find out how to create their own art or music?
Teachers would have even less time to teach material that is not on the test. Test prep would continue to drive instruction, large numbers of students at both ends of the achievement spectrum would be ignored and the curriculum would narrow even more.
Test scores are not accurate enough or reliable enough to justify serious consequences. Consider how the incorrect reporting and scoring of standardized tests, as seen in the 2003 administration of the Math A and Physics Regents exams, as well as other city and state tests over the past six years, affected students and teachers. Diplomas were granted or withheld based on false scores. Most recently, parents and educators raised questions regarding the validity of test items on the 4th grade ELA exam, pointing out that flawed test designs can yield flawed results.
4. In their accountability plans New York City and State must use a variety of indicators. These indicators must recognize sustained growth over time. The city and state should not rely solely on the collection and analysis of standardized test scores or other absolute measures of performance to evaluate schools.
A variety of indicators should form the basis for the evaluation of schools; such a system would especially recognize the continuous success of high performing schools. In order to improve instruction, the state and city should use multiple forms of evidence, such as longitudinal studies of a cohort of students rather than the performance in a grade of a different group of students from year to year.
The emphasis on absolute measures for improvement such as New York State uses to formulate its list of the most improved schools or SURR schools does not take into account sustained achievement at very high levels. Absolute measures that may require increases of a specific number of “points” or other indicator unfairly penalize schools consistently performing in the very high ranges of achievement. An increase of 20 points, for example, to qualify as improved might not be realistic for schools performing in the 90s range. Conversely, the current AYP formula for labeling schools as in need of improvement under NCLB penalizes struggling schools that may show increases in performance based on the indicators in use but fail to meet an absolute benchmark. A school that has a performance target of seven points will receive no credit for six points, no matter the circumstances of the school.
The AFT, along with educators, researchers, test developers and legislators around the country, has called for changes in the AYP formula that give credit for progress towards proficiency. Additionally, schools that show improvement should not lose money and other resources for failing to meet an arbitrary, absolute standard even though they are making progress.
5. New York State Education Department should fully incorporate the use of performance based assessments in its accountability program that will allow students to demonstrate, over time, strengths, skills and knowledge in a variety of ways and that will engage students in real learning.
The selling point for standardized tests is that they are easy to score, analyze and report to the general public. However, when these tests are the only method for judging student achievement they can, as hundreds of participants in our high stakes testing forums noted, have an adverse affect on teaching and learning and drive wrong instructional and evaluative decisions. The reliance upon excessive and expensive standardized tests is designed not to educate but to produce quick and often "dirty" scores that are used to label and punish. Rather than a punitive and narrow view of what constitutes a good school and effective teaching, we should direct our efforts at providing alternative methods of assessment that can yield a richer, more sophisticated, more realistic view of what and how students are learning. These alternatives do exist.
States such as Nebraska, Wyoming, Maine and New Hampshire have incorporated multiple forms of assessment into their NCLB mandated statewide accountability systems in addition to or in place of a single measure of student performance derived from a standardized test. None of these states use standardized tests to make high stakes decisions. In Rhode Island administrators may use no more than 10% of a school's test scores to determine a school's overall quality rating, or a student's promotion, placement or graduation. Rhode Island also includes teacher, parent, student and administrator surveys about the learning environment in each school. It requires financial information about how schools spend their money. A team of evaluators also visits schools to assess progress and provide input into the final evaluation.
6. New York City must continue to support and fund the use of formative assessments in schools and classrooms.
Interim assessments, by definition, are intended to be formative. Formative assessments are what teachers do on a daily basis and teachers use these assessments, based on their professional judgment, to shape, develop, individualize and form instruction before students have moved on to a new topic or skill. When teachers give these formative diagnostic assessments the students, understandably, may not have reached the level of mastery that continued instruction can provide. Formative diagnostic assessments by their very nature are not meant to contribute to student grades or high stakes decisions; they are meant to improve instruction.
The assessments that the DOE is mandating are not designed to accomplish that purpose and they do not provide timely enough data to teachers to guide instruction. Although the department permitted some groups of schools to '"DYO," (design your own) assessments during the current school year, it is now suggesting it will only allow schools in the next school year to choose their interim assessments from an approved list of department vendors. There is rarely, if ever, alignment between packaged interim assessments, packaged test preparation materials and the state or city exams. The purchase of such materials does, however, provide a great deal of profit for test publishers that could be better spent in classrooms. Teachers must be allowed and supported in developing formative assessments tied directly into their day-to-day instruction that they can use to provide feedback to students that students can immediately use to reflect on their strengths, weaknesses and what they have to do to make progress. This kind of assessment is the single way to motivate students to take charge of their own learning.
New York State has already laid the groundwork for this. In the 1990s, at the time the state was reviewing its testing program, the New York State Board of Regents approved the establishment of an Assessment Quality Assurance and Assistance Panel as part of the state's Compact for Learning. This panel was intended to help schools move from a total reliance on standardized tests to one that incorporates other forms of assessment. This panel would also have evaluated the appropriateness of schools' assessment programs and the manner in which they are used and provide guidelines for the appropriate use of assessment data. It is time to revive this dormant proposal.
Instituting an accountability system that includes multiple forms of assessment must be comprehensive and ongoing and no single component of the system can be singled out for high stakes decisions. It will require professional development supported by appropriate resources to assist teachers in the development, instructional uses and evaluation of student portfolios and other performance based approaches to student assessment. Teachers who have used these types of assessments remind us that in order to incorporate portfolios and other approaches to student assessment as part of an overall program of evaluation their content must be aligned with the standards and the curriculum, just as it should be with standardized tests. Ensuring this occurs must be an integral part of any professional development plan.
The professional development must be geared to subject matter and level. Appropriate materials, technology and research must be available to help teachers in designing appropriate assessments. Teachers unfamiliar with such assessments should have the opportunity to work with colleagues on development and administration of performance based assessments in the classroom. This serious and collegial sharing of knowledge and expertise among teachers in and across schools can ensure that performance assessments do not degenerate into rigid, top-down formulaic evaluations of student performance. This kind of professional dialogue, sharing of ideas and constant monitoring of student performance can do more to really improve instruction than any effort to raise scores by teaching students to answer a few more questions correctly.
The use of these assessments may necessitate changes in school scheduling to accommodate portfolio review panels and other activities related to evaluation of these assessments. Ensuring time, resources and a model for professional development that is collegial and helpful (such as that described in the union’s career continuum proposal) can be subjects we address in collective bargaining.
Teachers too must introduce students to the proper development and use of portfolios as a tool to monitor and track their (students’) accomplishments. Educators can standardize the kind of work a student portfolio should contain for a specific purpose, e.g. graduation. The rubrics teachers use for judging the quality of student portfolios should include input from students when appropriate but can be shared across schools to ensure that they illustrate meaningful performance and hold students to high standards.
Just as teachers, no matter what their level of teaching experience may be new to portfolio use, so with students. If portfolios and other performance based assessments are to become an integral part of evaluation and accountability systems, then their use must begin sooner than high school, which is currently the case in many places. Collections of student work and student driven, teacher guided, performance-based assessments, aligned with the standards can begin in the early grades and follow a continuum of increasing familiarization and sophistication for students. Students must recognize that these types of assessments will require working in new and different ways.
Learning is a complex process. Teaching is a complex activity. Assessment should recognize and capture this complexity.
7. New York State must encourage and support schools that wish to enter the Performance Standards Consortium.
A model for the kind of assessment system we described above exists in New York State in the Performance Consortium schools that have received waivers from four of the five Regents exams permitting teachers, on the high school level at least, to use a variety of assessment tools to evaluate student progress and adjust classroom instruction. For example, students may present and defend a portfolio of their work to panels of teachers, students and other educators, similar to the process used to evaluate PhD candidates. Of course all tests and assessment, including portfolios and other alternate assessments must be valid (the results measure what the test is intended to measure and can be interpreted meaningfully) and reliable (the measures are consistent and objective). This requirement is especially important when tests and assessments assume a significant role in accountability systems.
The state has not granted additional waivers or provided information to schools that might be interested in using such an assessment system. They should allow additional schools that wish to enter the consortium to do so.
8. The UFT should ensure that teachers, parents, and the public in general receive accurate, clear information in order to understand the data that test results, attendance and graduation statistics and other sources can generate.
In order to encourage public understanding of and support for a variety of assessments and their place in holding students and schools accountable for achievement, parents and teachers must be knowledgeable about tests and assessments and receive information that will enable them to work effectively with children at home and understand what is happening in their child’s classroom. The DOE formula for computing a single letter grade for a school is neither clear nor transparent and does not accurately communicate real information about schools to parents, teachers and the public.
A single score or letter grade for an entire school is not a good starting point for a discussion. Teacher comments, written and oral, about specific aspects of a student's performance as well as appropriate rubrics can provide other, often more useful, individualized measures.
In order to increase awareness the UFT should develop a series of conferences, forums and seminars as part of a UFT Institute for Assessment and Accountability, similar to the UFT Teacher Center Urban Educator Forums, where parents and teachers can explore and debate the myriad issues related to assessment and accountability, especially those related to the use of multiple measures to evaluate student performance. We need to make “assessment literacy” a focus in professional development activities. Universities must make it a central element in pre-service teacher education.
9. States, schools, teacher unions and other educational organizations around the country should work together and with the federal government to explore uniform approaches to assessment and accountability based on a sampling of students from all grade levels similar to that the National Assessment of Educational Progress (NAEP) uses.
At this point it seems clear to many that while well intended NCLB is a flawed law that is not working to accomplish its good intentions. This is not merely a matter of the failure to fund it adequately. It has set unreasonable expectations and provides only sanctions to enforce those expectations. This punitive model has created a system that focuses on testing and test preparation, not on teaching and learning. It is not closing the achievement gap and the results on state assessments are not matching the results of NAEP or other measures used to audit claims of state performance. It has redefined educational priorities to make test scores in individual schools more important than instruction, planning and curriculum development. NCLB has provided an excuse for the Klein administration to institute excessive and unwarranted testing, and their misuse in every grade of our city's schools to the detriment of students and teachers. A sampling procedure rather than high stakes tests of every student on every grade would put the focus back onto instruction and student learning.
The current belief in and dependence on testing, test preparation and data collection is harming students in this country. The huge amounts of money states and local school districts spend on packaged tests, test prep materials, testing consultants, and the scoring, collecting, analyzing and reporting of results could be much better spent directly in schools and classrooms. The time spent on test preparation and test administration could be better spent on instruction.
Although NCLB dictates that states must have a system to measure AYP the components of this system vary wildly from state to state. New York State's (and most state's) accountability systems rely almost totally on standardized tests of varying quality with some inclusion of attendance and graduation rates as additional indicators. Nebraska, at the other end of the accountability spectrum, allows school districts to use a portfolio of assessments (that may include standardized tests) that are then sent to the state for rating. Nebraska disseminates those local accountability models it considers the best around the state for possible replication.
No matter how good (or bad) an individual state's system may be this patchwork of accountability does not provide a true picture of how students in individual states or the country as a whole are doing. This country's educational system needs a consistent accountability system that will encourage in-depth teaching and curriculum and that should include locally developed, valid and reliable performance based assessments. The creation, field testing and implementation of all new tests and assessments must involve educators. Teachers have knowledge of coursework and student development that make them uniquely qualified to be part of this task. Teacher input can ensure that tests closely match coursework, measure many aspects of student achievement such as critical thinking and skill mastery, and provide useful data that will enhance instruction.
For example, it was only after student results on the spring 2003 administration of the Math A and Physics Regents Exams pointed up the lack of alignment between the coursework and the tests that the New York State Commissioner of Education solicited teacher recommendations for a more valid and reliable August 2003 administration of these exams. This points up the necessity of involving classroom teachers in this process from the beginning and not relying blindly on corporate test manufacturers and pre-packaged testing materials.
An accountability system must include an objective, external and collaborative school review process that assesses effective practices and provides a mechanism for the sharing of these effective practices. Schools, districts and states must do an annual reporting of their progress using a defined and agreed upon set of indicators such as uniformly calculated graduation rates, average daily attendance rates, teacher turnover and demographic information. This information can be supplemented with a sampling procedure using a standardized test such as the NAEP. This will allow for comparisons across schools, districts and states for those who believe this is meaningful and necessary.
In order to stop, not only in New York City but throughout the country, the excessive, duplicative, and expensive use of often flawed standardized tests that are not aligned with what is being taught, we should explore the possibility of using one method of tracking student and school progress from year to year nationwide such as expanding the existing NAEP, now currently used in many states but only on a voluntary basis and only as an addition to their local testing program.
Conclusion
The combined city, state and federal requirements have created a testing culture in our schools that is truly excessive and inappropriately high stakes. When the stakes become as high as they are now, unintended consequences can be the result. UFT members support accountability and meaningful assessment, but we are very concerned about the misuse and over-reliance on high stakes tests to evaluate students, teachers and entire schools. We have given many different groups with many different educational and political beliefs the opportunity to express their opinions on the role that tests and assessments should play in our schools. The individuals on this task force mirror the variety of opinions contained in the education world at large. The recommendations represent a consensus of opinion and provide a basis for future discussion.
Implementing some of these recommendations will not be easy. It will require the wise use of time, money, and resources to create an assessment system that supports both good instruction and accountability. It will require educating, and meeting with representatives of the DOE and members of the New York City Council to demand public explanations of the educational rationale behind decisions that city, state and federal officials make about testing policies and the expenditure of public funds. It will require that we work together with our colleagues in our statewide affiliate, NYSUT and our national affiliate, AFT, to lobby for help to implement these changes.
We must also present these recommendations to the union membership in a variety of forums, meetings and discussion groups so we can use them to drive specific changes in classrooms We anticipate that members from all levels and who work with a variety of students will be able to guide the UFT leadership as to how an assessment and accountability program as recommended in this report can improve and enhance what occurs in their schools around the city.
We must recognize that using tests and assessments as tools for accountability is not the same as using tests and assessments as tools for improving instruction. The former is fraught with the pitfalls that come from the intuitively appealing but wrongheaded over reliance on the scores; the latter can lead to improved student outcomes. In either case tests and assessments cannot stand apart from the other conditions and realities that obtain in our schools and our society. Quality instruction and leadership, small class size, the equitable distribution of resources, student support services, an appropriate and adequate infrastructure, public involvement and up-to-date technology provide conditions for student success that are more important than tests intended to measure those outcomes. The consequences for failing to provide these opportunities will show up in test results but we must not ignore the causes. Closing the achievement gap among the various groups in our schools and guarantying that all students have equal opportunities to learn is a priority. To the extent that tests and assessments can indicate that this is occurring and provide models showing where there is success and warning where there is failure is instructive. But mismeasures based on an over-reliance on flawed testing instruments hurt everyone.
How we measure our schools measures our society’s commitment to public education.
