I have updated my priors. Take-home exams are seriously unreliable. No academic can truly ascertain talent, and distinguish this from work generated by AI. Believing otherwise reflects either naivety or misplaced hubris.
Six months ago, I was actually optimistic about designing assessments that could outsmart Claude. But now, after teaching 170 students and learning from wider research, I think we need to fundamentally redesign teaching in an age of AI.
This is not a manifesto, but an open reflection to share with colleagues grappling with the same quandaries. Ideas and feedback are welcome!
"AI-Proof" Assessments?
In August 2024, I was concerned by the new risk plagiarism, yet eager to experiment. Over the past semester, I piloted a two-pronged approach:
I crafted questions requiring deep understanding of course material and creative thinking, such as:
How do Hindu Nationalism and Indonesian Islamic movements counter Acemoglu and Johnson’s arguments in “The Narrow Corridor”
Design interventions for South Korea’s low fertility, providing academic justification
How do environmental histories challenge conventional wisdom about Middle Eastern authoritarianism?
Initial tests against Claude suggested these questions were beyond AI’s capabilities…
Embracing AI in the Classroom
Eager to prepare my students a workplace that increasingly rewards tech skills, I taught them how to use Generative AI. In every single lesson, students worked in pairs to pose questions to Claude, evaluate its responses, and improve. We used it to:
Analyse land reform's role in East Asia's industrialisation
Interpret regression tables from key papers
Explore how Communism mediates the relationship between industrialisation and democratisation
As outlined in “From Hesitation to Mastery”, my goal was to create a level playing field where all students would be able to use AI effectively and critically. By highlighting its benefits, I hoped to encourage universal adoption, enabling me to to set demanding assessments.
Confronting Reality
Now, having marked 170 undergraduate and postgraduate essays, I realise two major hurdles:
1. Tech-Hesitation
Even though I integrated AI into teaching, not all students feel equally comfortable. Recent research finds that women are 25% less likely to use generative AI globally. This gap persists across regions, sectors, and occupations - as discussed in my earlier essay, “Are Women Missing Out on AI?”.
Prohibiting AI proves counter-productive for several reasons:
Detection is difficult
Students may be disadvantaged if they do not use the latest technologies
Unenforceable bans exacerbate inequalities
At one Norwegian business school, when AI was permitted, usage intentions were similar between genders (87% men, 83% women). But with a ban, men’s intended usage dropped modestly to 67% while women’s plummeted to 46%. High-achieving women were especially rule-abiding.
This creates a fundamental challenge:
How can we accurately identify talent in anonymous assessments when we cannot determine who used AI?
Should we penalise students who do not harness the most advanced technology, or
Should we reward human-only essays that demonstrate genuine understanding?
2. AI’s Rapid Evolution
While my earlier questions could stump Claude, its capabilities have since advanced. Students needn’t rely on its training data. If you copy and paste my textbook on international development (which includes a breadth of theories, evidence, and references), then pose my assessment questions, the answers are excellent!
Hoisted with my own petard!
As AI rapidly improves, with better accuracy, fewer hallucinations, and broader knowledge, even our most challenging essay questions become easy for machines.
Quandaries!
We face an impossible task in fairly evaluating student work. When AI use is both undetectable and offers significant advantages, marking becomes difficult. OK. Here’s a serious question:
Yet remember, we can’t distinguish whether they used ChatGPT! So it’s like judging a race without knowing whether someone has sprinted or driven a car!
The Learning Crisis
This rapid advancement raises a deeper concern. If AI can consistently produce undetectable, persuasive essays, what motivates deep learning? Why should students spend hours wrestling with the knottiest problems and complex ideas when AI can craft compelling analysis? Why bother when machines achieve the same result?
We risk producing graduates who can submit excellent work while understanding very little. This threatens not just academic integrity but career readiness and productivity!
Sceptics might advocate returning to closed book, in-person assessments. Now, perhaps these are useful as a diverse package of assessments, but other exams should also encourage our students to harness the technological frontier. This creates a puzzle: how can we encourage both deep learning and technological mastery?
A Path Forward: Two-Stage Assessment
After considerable consideration, I propose combining essays with oral examinations:
Written coursework, with AI use explicitly permitted
A follow-up 10-minute video interview where students explain their work
This structure promotes both AI literacy and authenticity:
Students gain technological confidence through essay preparation
Oral components verify genuine understanding
Yes, interviewing 170 students requires approximately four working days. However, before professors all unsubscribe - this investment may be vital to our security, as it ensures we can actually motivate and reward talent!
As the the world becomes increasingly trapped in the easy conveniences and entertainment of home-bound isolation, schools and universities could actually be a bit subversive: nurturing what makes us distinctly human - our ability to connect, communicate, and persuade through personal interaction.
Looking Forward
Assessment methods shape the entire university experience, influencing how students engage with material, manage time, and develop skills. We must ensure our assessments target what truly matters: the ability to leverage technology while developing the deep understanding necessary to explain and apply knowledge effectively.
While traditional take-home exams have become untenable, combining written work with oral assessment encourages both technological mastery and deep learning. University departments might also consider stronger integration of Generative AI, helping all students become tech-savvy. This combined approach acknowledges AI's utility while ensuring students develop the fundamental understanding and communication skills that make us human.
What do you think?
Related Essays
Crafting AI-Complementary Skills and Bulletproof Assessments
From Hesitation to Mastery: Integrating AI in University Teaching
Still Sceptical?
Today, I asked Claude: “why might the relationship between industrialisation and democratisation be different in communist societies? please write a 1500 word essay, with references.
So how would you mark this?
The oral explanation is a really great idea! At least for now...
I work in AI and I don't believe we'll get to super intelligence only with transformer based LLMs. But I also don't think we're likely *that* many years from this all just being vanity projects clinging to the past. As in, the AI will be better than all of us, not just in the classroom but in the workplace.
I've already seen that tech startups aren't hiring anyone junior because a few founder and first hire senior engineers guiding the current models can replace a dozen junior engineers with Claude or GPT while still doing their own job. So in 10 years how many senior engineers will there be?!Then what do we do? These questions are coming for all of us 😬
I think the 19th-century paintings are better to look at than the AI-generated images ;)