Court Software May Be No More Accurate than Web Survey Takers in Predicting Criminal Risk

Body

January 17, 2018 – A widely-used computer software tool may be no more accurate or fair at predicting repeat criminal behavior than people with no criminal justice experience, according to a Dartmouth College study.

The Dartmouth analysis showed that non-experts who responded to an online survey performed equally as well as the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) software system used by courts to help determine the risk of recidivism. 

The paper also demonstrates that although COMPAS uses over one hundred pieces of information to make a prediction, the same level of accuracy may be achieved with only two variables – a defendant’s age and number of prior convictions. 

According to the research paper, COMPAS has been used to assess over one million offenders since it was developed in 1998, with its recidivism prediction component in use since 2000.

The analysis, published in the journal Science Advances, was carried out by the student-faculty research team of Julia Dressel and Hany Farid.

“It is troubling that untrained internet workers can perform as well as a computer program used to make life-altering decisions about criminal defendants,” said Farid, the Albert Bradley 1915 Third Century Professor of Computer Science at Dartmouth College. “The use of such software may be doing nothing to help people who could be denied a second chance by black-box algorithms.” 

According to the paper, software tools are used in pretrial, parole, and sentencing decisions to predict criminal behavior, including who is likely to fail to appear at a court hearing and who is likely to reoffend at some point in the future. Supporters of such systems argue that big data and advanced machine learning make these analyses more accurate and less biased than predictions made by humans.

“Claims that secretive and seemingly sophisticated data tools are more accurate and fair than humans are simply not supported by our research findings,” said Dressel, who performed the research as part of her undergraduate thesis in computer science at Dartmouth.

The research paper compares the commercial COMPAS software against workers contracted through Amazon’s online Mechanical Turk crowd-sourcing marketplace to see which approach is more accurate and fair when judging the possibility of recidivism. For the purposes of the study, recidivism was defined as committing a misdemeanor or felony within two years of a defendant’s last arrest. 

Groups of internet workers saw short descriptions that included a defendant’s sex, age, and previous criminal history. The human results were then compared to results from the COMPAS system that utilizes 137 variables for each individual.

Overall accuracy was based on the rate at which a defendant was correctly predicted to recidivate or not. The research also reported on false positives—when a defendant is predicted to recidivate but doesn’t—and false negatives—when a defendant is predicted not to recidivate but does. 

With considerably less information than COMPAS—seven features compared to 137—when results were pooled to determine the “wisdom of the crowd,” the humans with no presumed criminal justice experience were accurate in 67 percent of the cases presented, statistically the same as the 65.2 percent accuracy of COMPAS. Study participants and COMPAS were in agreement for 69.2 percent of the 1000 defendants when predicting who would repeat their crimes. 

According to the study, the question of accurate prediction of recidivism is not limited to COMPAS. A separate review cited in the study found that eight of nine software programs failed to make accurate predictions.

“The entire use of recidivism prediction instruments in courtrooms should be called into question,” Dressel said. “Along with previous work on the fairness of criminal justice algorithms, these combined results cast significant doubt on the entire effort of predicting recidivism.”

In contrast to other analyses that focus on whether algorithms are racially biased, the Dartmouth study considers the more fundamental issue of whether the COMPAS algorithm is any better than untrained humans at predicting recidivism in an accurate and fair way.

However, when race was considered, the research found that results from both the human respondents and the software showed significant disparities between how black and white defendants are judged.

According to the paper, it is valuable to ask if we would put these decisions in the hands of untrained people who respond to an online survey, because, in the end, “the results from these two approaches appear to be indistinguishable.”