
Staggering sums are being invested in AI-related companies, such as Nvidia and OpenAI, which make millions of dollars seem like loose change. This investing frenzy is not entirely unwarranted, given the promise of AI. Free AI tools, for example, have been used to write an award-winning book. Hollywood is salivating at the prospect of hiring an AI actress. OpenAI has recently released a mobile app that can generate videos nearly indistinguishable from reality. AI tools can accelerate academic research by searching through multiple online libraries to identify and summarize the most relevant information. They can determine novel protein structures, which can aid in the development of cures for various diseases. However, most people use generative AI for everyday tasks such as writing emails.
Since antiquity, humans have conveyed their most profound (and mundane) thoughts through writing. This includes great thinkers like Aristotle, storytellers such as Agatha Christie, and literary icons like James Joyce, all of whom were keen observers of the human condition (in addition to some oddballs like the Unabomber). Some who have stories to tell but lack the requisite writing skills employ ghostwriters instead. AI tools now provide everyone—be it a lawyer filing a brief or a student producing a last-minute essay—a “free” ghostwriter who demands neither money nor credit. The “writer” who supplies the AI prompt and claims credit for the writing may even be ignorant of the thoughts the AI ghostwriter squeezes into the text.
The ubiquity of AI tools means that every student is one suspicion away from being an AI-using cheat. In the summer of 2024, between my junior and senior years of high school, I took an online class with Texas Tech University’s remote learning institute to fulfill my high school’s graduation speech requirement. The class explicitly prohibited the use of generative AI for all assignments and threatened disciplinary measures for any violations. I thought little of it; this was standard verbiage at my high school.
The class involved reading excerpts and writing essay responses. I submitted my first few essay responses with no issues. But then, out of nowhere, the grader accused me of using AI to generate my work, except that I had written it myself, just like the essays before it. After a series of exchanges, I could only gather that the grader had used two AI detectors, both of which showed a greater than 50% chance that the essays were AI generated.
Since I wanted to complete the assignment and the online class, I wrote and submitted two additional essays. The grader grudgingly accepted them and warned me not to use AI again (sigh!). I had narrowly escaped conviction in the kangaroo court of AI detection.
My grader treated AI detectors as infallible, but in fact they are notoriously unreliable. In one widely cited case, AI detectors concluded with nearly 100% certainty that the U.S. Constitution is AI-generated. Yet these capricious detectors are prevalent in our schools and universities. AI detectors attempt to identify patterns in the outputs of ChatGPT and other LLMs (Large Language Model) to determine if a text is AI-generated. The problem is that ChatGPT is itself trained using human-written text and is designed to mimic human-written sentences. It is unreasonable to judge students’ work using an opaque AI-driven computer program with questionable skills.
Given my history with AI detection software, when I joined UT, I researched the university’s policy on detecting AI ghostwriting. Unlike the Texas Tech system I experienced, UT explicitly disallows the use of general AI detection software because of its “high risk” nature. In fact, many institutions of higher education have stepped back from AI detectors for similar reasons. However, this leaves the door open for students to use AI, not just to better understand educational material, but as a ghostwriter for essay assignments.
Since antiquity, there has been a tradition of teachers using verbal discussion and oral exams to assess how well students have absorbed the material taught to them. The downside of oral evaluation is how much longer it takes to determine a student’s level of comprehension and identify learning gaps. Educational systems that utilize written assignments can target a larger class size, as written assignments can be assessed asynchronously. When students use generative AI to ghostwrite for them, teachers are compelled to be more creative in fulfilling their roles as sculptors of students’ minds. They must adjust their strategies to assess how their sculptures are performing.
To be clear, generative AI tools are ubiquitous and here to stay. Prohibiting AI tools in written assignments and using AI detectors to police the practice is like trying to push back the tide with a spoon. These tools will continue to improve, and it is unreasonable to expect that students will not use them for asynchronously submitted assignments. Student evaluation, therefore, must be based on work produced in a controlled environment where students do not have access to AI ghostwriters. Evaluations in math classes at UT are based on work produced synchronously under the supervision of proctors. Perhaps such a format is more suited to a math class, which deals with precise symbolic language and involves less writing than liberal arts classes. Regardless, at UT and elsewhere, professors in writing-intensive majors have shifted to using in-class handwritten bluebook exams to provide a measure of student evaluation. Dr. Robbie Kubala, a professor in the Department of Philosophy, started introducing bluebook exams in one of his classes this semester. Dr. Jeffrey Leon, also of the Department of Philosophy, started using bluebook exams for the first time in five years.
Writing longhand in a bluebook exam is inconvenient for a generation that has always had the luxury of writing digitally. However, the ability to exclude AI ghostwriting makes this a workable, albeit imperfect, compromise. The rigidity of the bluebook exam disadvantages students who are night owls or extreme early birds and able to do their best work at times of the day when such exams cannot be scheduled. It is also disadvantageous for students who do not perform well under time pressures. The blue book format also does not allow for deep research, or work that takes more than a couple of hours to produce. This can be remedied by implementing oral evaluations for long research projects. The instructor can then test the students knowledge regardless of whether an AI tool was employed in the written report. Similar adaptations will have to be devised for purely remote learning.
Obviously, AI presents a challenge to academic evaluation, but it can be addressed by returning to bluebooks, oral exams, and other creative adjustments. And it isn’t all bad. Just as AI has revolutionized scientific research, it could substantially improve our education system. Still, if we don’t pay close attention to the potential consequences, innocent students are bound to be hurt.
Categories: Culture