close
close

Children who use ChatGPT as a study assistant perform worse on tests

Children who use ChatGPT as a study assistant perform worse on tests

Does AI actually help students learn? A recent high school experiment offers a cautionary tale.

Researchers at the University of Pennsylvania found that Turkish high school students who had access to ChatGPT while practicing math problems performed worse on a math test compared to students who did not have access to ChatGPT. Those with ChatGPT solved 48 percent more practice problems correctly, but ultimately scored 17 percent worse on a test on the topic the students were learning.

A third group of students had access to a revised version of ChatGPT that functioned more like a tutor. This chatbot was programmed to provide hints without immediately revealing the answer. Students using it performed dramatically better on practice problems, correctly solving 127 percent more of them than students who did their practice problems without high-tech tools. But on the next test, these AI-guided students did no better. Students who simply did their practice problems the old-fashioned way—on their own—matched their test scores.

The researchers titled their paper, “Generative AI Can Harm Learning,” to make the case to parents and educators that the current crop of freely available AI chatbots “can significantly hinder learning.” Even a refined version of ChatGPT designed to mimic a tutor doesn’t necessarily help.

The researchers believe the problem is that students are using the chatbot as a “crutch.” When they analyzed the questions students typed into ChatGPT, students often simply asked for the answer. Students were not building the skills that come from solving the problems themselves.

ChatGPT’s errors may also have been a contributing factor. The chatbot only got the math problems right about half the time. The arithmetic calculations were wrong 8 percent of the time, but the bigger problem was that the step-by-step approach to solving a problem was wrong 42 percent of the time. The tutoring version of ChatGPT got the solutions right the first time, and these errors were minimized.

A draft article about the experiment was posted on the website of SSRN, formerly known as the Social Science Research Network, in July 2024. The article has not yet been published in a peer-reviewed journal and is still subject to revision.

This is just one experiment in another country, and more studies are needed to confirm the findings. But this experiment was large, involving nearly 1,000 students in grades nine through 11 in the fall of 2023. Teachers first watched a previously taught lesson with the entire class, and then randomly assigned their classes to practice the math in one of three ways: with access to ChatGPT, with access to an AI tutor powered by ChatGPT, or with no high-tech tools at all. Students in each class were given the same practice problems with and without AI. Then they took a test to see how well they had learned the concept. Researchers ran four cycles of this, giving students four 90-minute sessions of practice time in four different math topics to understand whether AI tends to help, hurt, or do nothing.

ChatGPT also seems to breed overconfidence. In surveys accompanying the experiment, students said they didn’t think ChatGPT caused them to learn less, even though it did. Students with the AI ​​tutor thought they did significantly better on the test, even though they didn’t. (It’s also a good reminder to all of us that our perceptions of how much we’ve learned are often wrong.)

The authors compared the problem of learning with ChatGPT to autopilot. They explained how overreliance on autopilot led the Federal Aviation Administration to recommend that pilots minimize their use of the technology. Regulators wanted to ensure that pilots still knew how to fly if the autopilot malfunctioned.

ChatGPT is not the first technology to offer a tradeoff in education. Typewriters and computers reduce the need to write by hand. Calculators reduce the need to calculate. When students have access to ChatGPT, they can solve more problems correctly, but learn less. Getting the right result on one problem doesn’t help them on the next.

This story about using ChatGPT to practice math was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof points and others Hechinger newsletters.

The Hechinger Report provides in-depth, fact-based, unbiased reporting on education that is free to all readers. But that doesn’t mean it’s free to produce. Our work keeps educators and the public informed about pressing issues in schools and campuses across the country. We tell the whole story, even when the details are uncomfortable. Help us keep doing that.

Join us today.