The Trouble with Teachers Using AI to Grade Student Work: A Step Toward Obsolescence

If you’ve talked to a teacher recently, you’ve likely heard frustrations about the impact of AI on students’ attention spans, reading comprehension, and, of course, cheating.

As AI becomes more ubiquitous in daily life—thanks to tech companies pushing it into every corner—students are increasingly using software like ChatGPT to complete assignments. According to a study by the Digital Education Council, nearly 86 percent of university students rely on some form of AI for their work.

In response to this surge, some teachers are taking matters into their own hands—by using AI chatbots to grade student assignments. One teacher even posted on Reddit, “You’re welcome to use AI. Just let me know. If you do, the AI will also grade you. You don’t write it, I don’t read it.”

While some educators embrace AI as a time-saving tool—one professor in Ithaca, NY, even requires students to run their essays through AI before submitting them—others are beginning to question whether AI is truly up to the task of grading. According to researchers at the University of Georgia, it’s not.

The university’s research team tasked an AI model, Mixtral, with grading middle school homework responses. Instead of using a human-designed grading rubric, the team asked the AI to create its own system. The results were startlingly poor.

Mixtral only accurately graded the assignments 33.5 percent of the time, and even with a human-created rubric, its accuracy rate didn’t top 50 percent. While the AI graded quickly, it often relied on flawed reasoning, resulting in scores that were inconsistent and lacking logical depth.

“While LLMs can adapt quickly to scoring tasks, they often resort to shortcuts, bypassing deeper logical reasoning expected in human grading,” explained Xiaoming Zhai, one of the researchers. For example, the AI assumed that a student who mentioned a temperature increase understood the movement of particles, a logical jump that a human grader would not make without further evidence.

Even the researchers noted that improving AI grading accuracy with high-quality rubrics only increased the accuracy rate to a paltry 50 percent. This is the same AI technology that is touted as heralding a “new epoch,” yet it can barely manage to grade correctly half of the time.

Imagine if your car had a 50 percent chance of breaking down on the highway—it’d never make it out of the driveway. So why are we comfortable with taking the same gamble when it comes to student education?

This underscores the growing realization that AI, no matter how sophisticated, is no replacement for the nuanced understanding of a live teacher. In fact, as AI systems age, their accuracy seems to be declining. A recent New York Times report revealed that the latest generation of AI models “hallucinates” (generates false information) up to 79 percent of the time, far worse than earlier versions.

By relying on AI to grade, teachers are leaving their students in the hands of technology that is often inaccurate, overly eager to please, and prone to providing false information. This doesn’t even consider the potential cognitive decline in students who rely on AI regularly. If this is how we plan to address the AI cheating crisis, perhaps it’s time to reconsider the entire system—maybe it would be simpler to cut out the middleman and let students work directly with their artificial counterparts.

Contact Look-Ups