Test Validity and Test Reliability

Jackie Burt
Jul 6, 2013
5 min read

In addition to concerns about killing engagement, we must begin to consider more diligently the validity and reliability of standardized tests in an educational environment.

For decades, the emphasis placed on test scores has captured our nation’s attention. Reform efforts have a common end goal: test scores and data. Very important, life-changing decisions are based on the data derived from test scores: a child’s learning plan, a teacher’s tenure, a school’s reputation, an administrator’s trust, school programs, funding. Data drives detrimental decisions, such as the elimination of art and music programming, field trips, and the like. If such important decisions depend on “big data,” as Inda Schaenen writes about in her article, “Driving Miss Data—or Is Big Data Driving Schools?”[i] we must begin to consider the validity and reliability of standardized tests in an educational environment.

Of course an extensive amount of research goes into test creation. I have no doubt that the Educational Testing Service (ETS), the world’s largest testing and assessment organization and the source of almost every widely used standardized test,[ii] creates tests with adequate consideration of scientific validity in mind. To earn the esteemed label of valid, a test must pass several highly researched standards as outlined in the Standards for Educational and Psychological Testing (1999) by the American Educational Research Association (AERA).[iii] The process includes gathering evidence to provide “a sound scientific basis” for interpreting test scores. Tests hit our students’ desks having enjoyed an extensive history of research and analysis. From this perspective, the tests we give our children are trusted, highly developed, high-quality materials. However, even tests of the highest quality fall short of accounting for the most significant variable: students.

As educators, parents, and administrators, even if we trust that a test is built to adequately measure what it is designed to measure—the contents of a child’s school year—and even if we put our faith in the validity of such an assessment, we have still not considered the most significant and invalidating variable. For a scientific study to yield valid results, it must include reliable controls.

Standardized tests, by definition, are meant to offer a standard by which they are administered and scored. The consistency with which they are delivered to students is designed to eliminate extraneous variables, thus leading to a more effectively controlled study. Across a district, tests are administered on a certain day or week. Students are given a determined amount of time to complete tests. Classrooms and hallways are quiet zones meant to offer highly focused climates. Students are asked to eat well and sleep well prior to test day, and many schools offer a healthy snack before test booklets are cracked open. Standardized tests are to be standardly delivered. In theory, this eliminates unwanted variables. We are clearly aware of and concerned about controlling variables.

Some of the more glaring variables that students may bring to the table are also taken into consideration. Results of standardized tests often include recognition of population variables such as English Language Learners (ELL) and students eligible for free and reduced lunch (i.e., a socioeconomic element). Such recognition suggests that testing companies, governing bodies, and school officials agree that certain populations of students will alter test results. Why do we stop at these glaring variables, ignore potentially more significant invalidating variables, and call it good?

Although many children perform to the best of their ability and head into test days highly focused and ready for the charge, many others do not; this is the most significant extraneous variable facing and arguably invalidating standardized tests. Emotional, physical, and motivational factors can directly alter a child’s ability to show what he or she knows on a standardized test and should be considered variables for which we do not have adequate controls in place.

Emotional factors are potentially the most significant variables entering into test day for students. While one student has had a lovely morning with parents and siblings around the breakfast table discussing the day’s upcoming schedule, another child has experienced an irritable, overtired mother who raises her voice, shouting morning routine demands, allowing doubt and fear to linger. And while one child is properly dressed, feeling confident, another child cannot focus on anything beyond his high-water pants that were pointed out by someone on the playground. Fear of failure and lack of confidence are issues for some children. Test anxiety is a very real and detrimental emotional factor on test day. Feeling the need to rush through test questions in an effort to not finish last can affect a test taker. Self-doubt or inflated confidence can both be detrimental to answering questions correctly. A bad mood can affect performance. The boredom many children experience in a testing environment can completely zap motivation and desire to perform well. And simply the unfamiliar situation that test days bring to an otherwise vibrant classroom setting can impact student performance.

Physical factors can play a significant role as well and are intertwined with emotional and motivational factors. One child is adequately fed and rested, while another feels the rumbles of a hungry tummy. Distractions, such as a noise outside or a fellow student wiggling, can be significant barriers to focus. Fatigue and lack of nutrition can change a child’s ability to do well and can eliminate motivation.

All too common is the conversation in which a student admits to creating a pattern of bubbles on the answer sheet of a multiple-choice test. Variables such as these are significant and happen on a regular basis. Older students are regularly annoyed at having to spend the better part of a week in a testing environment. Many such students find very little value performing at their best and display passive and/or active rebellion during testing. If students know that a test does not directly affect their school-year grades, as is the case for many annual standardized tests, they have little to no buy-in. Lack of motivation in students directly affects test validity and reliability.

Testing advocates will likely disagree, turning a blind eye to the reality of such invalidating variables, but surely history will eventually reveal the folly with which our culture has revered high-stakes testing. If we could effectively eliminate emotional, physical, and motivational factors out of groups of students, then we might be more able to trust the validity of standardized tests. And we would be able to make important decisions with confidence. But we cannot. We cannot eliminate these significant factors. So, at the very least, we should give less credit to test results and data.

In addition, mainstream education and the testing industry have convinced parents that test scores are a viable means of measuring the success of a school or teacher. By and large, parents think a school is “good” or “bad” or a specific teacher is “good” or “bad” based on test scores. To ignore the thousands of more important factors, such as a child’s well-being, engagement, social/emotional development, or essential life skills in favor of data is a short-sighted approach to sizing up a given school or teacher.

We must begin to question these practices with the doubt they deserve and turn to more valid forms of measurement and decision making for the students we serve. We should utilize current research about what will best serve students and communicate about student needs on an individual student-by-student basis. We should take a more human approach to educational reform that recognizes human nature, human tendencies, and human capacity.

Furthermore, we have much more valuable assessment options. All teachers are trained in performance-based products and assignments. They use classroom and formative assessments day in and day out. They help students through the learning process and gather feedback about student progress on a minute-by-minute basis. Trusting a teacher’s ability to filter for variables such as hunger, mood, and frustration and to make instructional decisions on an individual basis over a large sample of time, such as a school year, will yield more accurate and pertinent assessment results.

[i] “Driving Miss Data—or Is Big Data Driving Schools?” St. Louis Beacon, July 7, 2013, https://www.stlbeacon.org/?_escaped_fragment_=/content/31517/voices_inda_data_062013. [i] www.ets.org.

[ii] www.ets.org.

[iii] “Standards for Educational and Psychological Testing,” The American Psychological Association, 1999, Accessed July 7, 2011. https://www.apa.org/science/standards.html.

Monthly Newsletter

Test Validity and Test Reliability

Recent Posts

Comments

We can't wait to meet you!

We are Orsch and we're here to help!

Contact

We can't wait to get to know you and tell you all about our pilot program. Contact us for more information!