Rotten Apples An Investigation of Prevalence & Predictors of Teacher Cheating Summary summarize the article and write a 1-2 page paper on it. BRIAN A. JACOB
AND
STEVEN D. LEVITT
We develop an algorithm for detecting teacher cheating that combines information on unexpected test score fluctuations and suspicious patterns of answers
for students in a classroom. Using data from the Chicago public schools, we
estimate that serious cases of teacher or administrator cheating on standardized
tests occur in a minimum of 4 –5 percent of elementary school classrooms annually. The observed frequency of cheating appears to respond strongly to relatively
minor changes in incentives. Our results highlight the fact that high-powered
incentive systems, especially those with bright line rules, may induce unexpected
behavioral distortions such as cheating. Statistical analysis, however, may provide a means of detecting illicit acts, despite the best attempts of perpetrators to
keep them clandestine.
I. INTRODUCTION
High-powered incentive schemes are designed to align the
behavior of agents with the interests of the principal implementing the system. A shortcoming of such schemes, however, is that
they are likely to induce behavior distortions along other dimensions as agents seek to game the rules (see, for instance, Holmstrom and Milgrom [1991] and Baker [1992]). The distortions
may be particularly pronounced in systems with bright line rules
[Glaeser and Shleifer 2001]. It may be impossible to anticipate
the many ways in which a particular incentive scheme may be
gamed.
Test-based accountability systems in education provide an
excellent example of the costs and benefits of high-powered incentive schemes. In an effort to improve student achievement, a
number of states and districts have recently implemented programs that use student test scores to punish or reward schools.
* We would like to thank Suzanne Cooper, Mark Duggan, Susan Dynarski,
Arne Duncan, Michael Greenstone, James Heckman, Lars Lefgren, two anonymous referees, the editor, Edward Glaeser, and seminar participants too numerous to mention for helpful comments and discussions. We also thank Arne Duncan, Philip Hansen, Carol Perlman, and Jessie Qualles of the Chicago public
schools for their help and cooperation on the project. Financial support was
provided by the National Science Foundation and the Sloan Foundation. All
remaining errors are our own. Addresses: Brian Jacob, Kennedy School of Government, Harvard University, 79 JFK Street, Cambridge, MA 02138; Steven
Levitt, Department of Economics, University of Chicago, 1126 E. 59th Street,
Chicago, IL 60637.
2003 by the President and Fellows of Harvard College and the Massachusetts Institute of
Technology.
The Quarterly Journal of Economics, August 2003
©
843
Downloaded from https://academic.oup.com/qje/article-abstract/118/3/843/1943009 by San Jose State University user on 28 March 2019
ROTTEN APPLES: AN INVESTIGATION OF THE
PREVALENCE AND PREDICTORS OF TEACHER
CHEATING*
844
QUARTERLY JOURNAL OF ECONOMICS
1. The federal legislation, No Child Left Behind, was passed in 2001. Prior to
this legislation, virtually every state had linked test-score outcomes to school
funding or required students to pass an exit examination to graduate high school.
In the state of California, a policy providing for merit pay bonuses of as much as
$25,000 per teacher in schools with large test score gains was recently put into
place.
2. Hereinafter, we uses the phrase “teacher cheating” to encompass cheating
done by either teachers or administrators.
3. We have no way of knowing whether the patterns we observe arise because
a teacher explicitly alters students’ answer sheets, directly provides answers to
students during a test, or perhaps makes test materials available to students in
advance of the exam (for instance, by teaching a reading passage that is on the
test). If we had access to the actual exams, it might be possible to distinguish
between these scenarios through an analysis of erasure patterns.
4. In contrast, there is a well-developed statistics literature for identifying
whether one student has copied answers from another student [Wollack 1997;
Holland 1996; Frary 1993; Bellezza and Bellezza 1989; Frary, Tideman, and
Watts 1977; Angoff 1974]. These methods involve the identification of unusual
Downloaded from https://academic.oup.com/qje/article-abstract/118/3/843/1943009 by San Jose State University user on 28 March 2019
Recent federal legislation institutionalizes this practice, requiring states to test elementary students each year, rate schools on
the basis of student performance, and intervene in schools that do
not make sufficient improvement.1 Several prior studies suggest
that such accountability policies may be effective at raising student achievement [Richards and Sheu 1992; Grissmer, Flanagan,
et al. 2000; Deere and Strayer 2001; Jacob 2002; Carnoy and Loeb
2002; Hanushek and Raymond 2002]. At the same time, however,
researchers have documented instances of manipulation, including documented shifts away from nontested areas or “teaching to
the test” [Klein et al. 2002; Jacob 2002], and increasing placement
in special education [Jacob 2002; Figlio and Getzler 2002; Cullen
and Reback 2002].
In this paper we explore a very different mechanism for
inflating test scores: outright cheating on the part of teachers and
administrators.2 As incentives for high test scores increase, unscrupulous teachers may be more likely to engage in a range of
illicit activities, including changing student responses on answer
sheets, providing correct answers to students, or obtaining copies
of an exam illegitimately prior to the test date and teaching
students using knowledge of the precise exam questions.3 While
such allegations may seem far-fetched, documented cases of
cheating have recently been uncovered in California [May 2000],
Massachusetts [Marcus 2000], New York [Loughran and Comiskey 1999], Texas [Kolker 1999], and Great Britain [Hofkins 1995;
Tysome 1994].
There has been very little previous empirical analysis of
teacher cheating.4 The few studies that do exist involve investi-
ROTTEN APPLES
845
patterns of agreement in student responses and, for the most part, are only
effective in identifying the most egregious cases of copying.
5. In the mid-eighties, Perlman [1985] investigated suspected cheating in a
number of Chicago public schools (CPS). The study included 23 suspect schools—
identified on the basis of a high percentage of erasures, unusual patterns of score
increases, unnecessarily large orders of blank answer sheets for the ITBS and tips
to the CPS Office of Research—along with 17 comparison schools. When a second
form of the test was administered to the 40 schools under more controlled conditions, the suspect schools did much worse than the comparison schools. An
analysis of several dozen Los Angeles schools where the percentage of erasures
and changed answers was unusually high revealed evidence of teacher cheating
[Aiken 1991]. One of the most highly publicized cheating scandals involved Stratfield elementary, an award-winning school in Connecticut. In 1996 the firm that
developed and scored the exam found that the rate of erasures at Stratfield was up
to five times greater than other schools in the same district and that 89 percent of
erasures at Stratfield were from an incorrect to a correct response. Subsequent
retesting resulted in significantly lower scores [Lindsay 1996].
6. We do not, however, have access to the actual test forms that students
filled out so we are unable to analyze these tests for evidence of suspicious
patterns of erasures.
Downloaded from https://academic.oup.com/qje/article-abstract/118/3/843/1943009 by San Jose State University user on 28 March 2019
gations of specific instances of cheating and generally rely on the
analysis of erasure patterns and the controlled retesting of students.5 While this earlier research provides convincing evidence
of isolated cheating incidents, our paper represents the first systematic attempt to (1) identify the overall prevalence of teacher
cheating empirically and (2) analyze the factors that predict
cheating. To address these questions, we use detailed administrative data from the Chicago public schools (CPS) that includes
the question-by-question answers given by every student in
grades 3 to 8 who took the Iowa Test of Basic Skills (ITBS) from
1993 to 2000.6 In addition to the test responses, we also have
access to each student’s full academic record, including past test
scores, the school and room to which a student was assigned, and
extensive demographic and socioeconomic characteristics.
Our approach to detecting classroom cheating combines
two types of indicators: unexpected test score fluctuations and
unusual patterns of answers for students within a classroom.
Teacher cheating increases the likelihood that students in a
classroom will experience large, unexpected increases in test
scores one year, followed by very small test score gains (or even
declines) the following year. Teacher cheating, especially if
done in an unsophisticated manner, is also likely to leave
tell-tale signs in the form of blocks of identical answers, unusual patterns of correlations across student answers within
the classroom, or unusual response patterns within a student’s
exam (e.g., a student who answers a number of very difficult
846
QUARTERLY JOURNAL OF ECONOMICS
Downloaded from https://academic.oup.com/qje/article-abstract/118/3/843/1943009 by San Jose State University user on 28 March 2019
questions correctly while missing many simple questions). Our
identification strategy exploits the fact that these two types of
indicators are very weakly correlated in classrooms unlikely to
have cheated, but very highly correlated in situations where
cheating likely occurred. That allows us to credibly estimate
the prevalence of cheating without having to invoke arbitrary
cutoffs as to what constitutes cheating.
Empirically, we detect cheating in approximately 4 to 5 percent of the classes in our sample. This estimate is likely to
understate the true incidence of cheating for two reasons. First,
we focus only on the most egregious type of cheating, where
teachers systematically alter student test forms. There are other
more subtle ways in which teachers can cheat, such as providing
extra time to students, that our algorithm is unlikely to detect.
Second, even when test forms are altered, our approach is only
partially successful in detecting illicit behavior. As discussed
later, when we ourselves simulate cheating by altering student
answer strings and then testing for cheating in the artificially
manipulated classrooms, many instances of moderate cheating go
undetected by our methods.
A number of patterns in the results reinforce our confidence that what we measure is indeed cheating. First, simulation results demonstrate that there is nothing mechanical
about our identification approach that automatically generates
patterns like those observed in the data. When we randomly
assign students to classrooms and search for cheating in these
simulated classes, our methods find little evidence of cheating.
Second, cheating on one part of the test (e.g., math) is a strong
predictor of cheating on other sections of the test (e.g., reading). Third, cheating is also correlated within classrooms over
time and across classrooms in a particular school. Finally, and
perhaps most convincingly, with the cooperation of the Chicago
public schools we were able to conduct a prospective test of our
methods in which we retested a subset of classrooms under
controlled conditions that precluded teacher cheating. Classrooms identified as likely cheaters experienced large declines
in test scores on the retest, whereas classrooms not suspected
of cheating maintained their test score gains.
The prevalence of cheating is also shown to respond to
relatively minor changes in teacher incentives. The importance
of standardized tests in the Chicago public schools increased
ROTTEN APPLES
847
II. INDICATORS
OF
TEACHER CHEATING
Teacher cheating, especially in extreme cases, is likely to
leave tell-tale signs. In motivating our discussion of the indicators
we employ for detecting cheating, it is instructive to compare two
actual classrooms taking the same test (see Figure I). Each row in
Figure I represents one student’s answers to each item on the
test. Columns correspond to the different questions asked. The
letter “A,” “B,” “C,” or “D” means a student provided the correct
answer. If a number is entered, the student answered the question incorrectly, with “1” corresponding to a wrong answer of “A,”
“2” corresponding to a wrong answer of “B,” etc. On the righthand side of the table, we also present student test scores for the
preceding, current, and following year. Test scores are in units of
“grade equivalents.” The grading scale is normed so that a student at the national average for sixth graders taking the test in
the eighth month of the school year would score 6.8. A typical
Downloaded from https://academic.oup.com/qje/article-abstract/118/3/843/1943009 by San Jose State University user on 28 March 2019
substantially with a change in leadership in 1996, particularly
for low-achieving students. Following the introduction of these
policies, the prevalence of cheating rose sharply in low-achieving classrooms, whereas classes with average or higher-achieving students showed no increase in cheating. Cheating prevalence also appears to be systematically lower in cases where
the costs of cheating are higher (e.g., in mixed-grade classrooms in which two different exams are administered simultaneously), or the benefits of cheating are lower (e.g., in classrooms with more special education or bilingual students who
take the standardized tests, but whose scores are excluded
from official calculations).
The remainder of the paper is structured as follows. Section
II discusses the set of indicators we use to capture cheating
behavior. Section III describes our identification strategy. Section
IV provides a brief overview of the institutional details of the
Chicago public schools and the data set that we use. Section V
reports the basic empirical results on the prevalence of cheating
and presents a wide range of evidence supporting the interpretation of these results as cheating. Section VI analyzes how teacher
cheating responds to incentives. Section VII discusses the results
and the implications for increasing reliance on high-stakes
testing.
848
QUARTERLY JOURNAL OF ECONOMICS
Downloaded from https://academic.oup.com/qje/article-abstract/118/3/843/1943009 by San Jose State University user on 28 March 2019
FIGURE I
Sample Answer Strings and Test Scores from Two Classrooms
The data in the figure represent actual answer strings and test scores from two
CPS classrooms taking the same exam. The top classroom is suspected of cheating; the bottom classroom is not. Each row corresponds to an individual student.
Each column represents a particular question on the exam. A letter indicates that
the student gave that answer and the answer was correct. A number means that
the student gave the corresponding letter answer (e.g., 1 ⫽ “A”), but the answer
was incorrect. A value of “0” means the question was left blank. Student test
scores, in grade equivalents, are shown in the last three columns of the figure. The
test year for which the answer strings are presented is denoted year t. The scores
from years t ⫺ 1 and t ⫹ 1 correspond to the preceding and following years’
examinations.
ROTTEN APPLES
849
III.A. Indicator One: Unexpected Test Score Fluctuations
Given that the aim of cheating is to raise test scores, one
signal of teacher cheating is an unusually large gain in test
scores relative to how those same students tested in the previous year. Since test score gains that result from cheating do
not represent real gains in knowledge, there is no reason to
expect the gains to be sustained on future exams taken by
these students (unless, of course, next year’s teachers also
cheat on behalf of the students). Thus, large gains due to
cheating should be followed by unusually small test score gains
for these students in the following year. In contrast, if large
test score gains are due to a talented teacher, the student gains
are likely to have a greater permanent component, even if some
regression to the mean occurs. We construct a summary measure of how unusual the test score fluctuations are by ranking
each classroom’s average test score gains relative to all other
Downloaded from https://academic.oup.com/qje/article-abstract/118/3/843/1943009 by San Jose State University user on 28 March 2019
student would be expected to gain one grade equivalent for each
year of school.
The top panel of data shows a class in which we suspect
teacher cheating took place; the bottom panel corresponds to a
typical classroom. Two striking differences between the classrooms are readily apparent. First, in the cheating classroom, a
large block of students provided identical answers on consecutive
questions in the middle of the test, as indicated by the boxed area
in the figure. For the other classroom, no such pattern exists.
Second, looking at the pattern of test scores, students in the
cheating classroom experienced large increases in test scores
from the previous to the current year (1.7 grade equivalents on
average), and actually experienced declines on average the following year. In contrast, the students in the typical classroom
gained roughly one grade equivalent each year, as would be
expected.
The indicators we use as evidence of cheating formalize and
extend the basic picture that emerges from Figure I. We divide
these indicators into two distinct groups that, respectively, capture unusual test score fluctuations and suspicious patterns of
answer strings. In this section we describe informally the measures that we use. A more rigorous treatment of their construction is provided in the Appendix.
850
QUARTERLY JOURNAL OF ECONOMICS
(3)
SCOREcbt ⫽ 共rank_ gainc,b,t兲2 ⫹ 共1 ⫺ rank_ gainc,b,t⫹1兲2 ,
where rank_ gain cbt is the percentile rank for class c in subject b
in year t. Classes with relatively big gains on this year’s test and
relatively small gains on next year’s test will have high values of
SCORE. Squaring the individual terms gives relatively more
weight to big test score gains this year and big test score declines
the following year.8 In the empirical analysis we consider three
possible cutoffs for what it means to have a “high” value on
SCORE, corresponding to the eightieth, ninetieth, and ninetyfifth percentiles among all classrooms in the sample.
III.B. Indicator Two: Suspicious Answer Strings
The quickest and easiest way for a teacher to cheat is to alter
the same block of consecutive questions for a substantial portion
of students in the class, as was apparently done in the classroom
in the top panel of Figure I. More sophisticated interventions
might involve skipping some questions so as to avoid a large block
of identical answers, or altering different blocks of questions for
different students.
We combine four different measures of how suspicious a
classroom’s answer strings are in determining whether a classroom may be cheating. The first measure focuses on the most
unlikely block of identical answers given by students on consecutive questions. Using past test scores, future test scores, and
background characteristics, we predict the likelihood that each
student will give each possible answer (A, B, C, or D) on every
question using a multinomial logit. Each student’s predicted
probability of choosing a particular response is identified by the
likelihood that other students (in the same year, grade, and
subject) with similar background characteristics will choose that
7. We also experimented with more complicated mechanisms for…
Purchase answer to see full
attachment
Science is the pursuit and application of knowledge and understanding of the natural and social…
Clearly stating the definition, the values, the meaning of such values and the type of…
All answered must be typed using Times New Roman (size 12, double-spaced) font. No pictures…
All answered must be typed using Times New Roman (size 12, double-spaced) font. No pictures…
https://www.npr.org/sections/ed/2018/04/25/605092520/high-paying-trade-jobs-sit-empty-while-high-school-grads-line-up-for-university Click on the link above. Read the entire link and answer the questions below…
All answered must be typed using Times New Roman (size 12, double-spaced) font. No pictures…