The oral explanation is a really great idea! At least for now...
I work in AI and I don't believe we'll get to super intelligence only with transformer based LLMs. But I also don't think we're likely *that* many years from this all just being vanity projects clinging to the past. As in, the AI will be better than all of us, not just in the classroom but in the workplace.
I've already seen that tech startups aren't hiring anyone junior because a few founder and first hire senior engineers guiding the current models can replace a dozen junior engineers with Claude or GPT while still doing their own job. So in 10 years how many senior engineers will there be?!Then what do we do? These questions are coming for all of us 😬
It isn’t hard to become AI literate. Loads of my cohort (~10 years out of undergrad) are doing just fine at it. Allowing use of AI to draft and then thinking that a 10 minute oral interview will suffice to separate the wheat from the chaff seems like a waste. Moreover, students who actually do the hard work of learning to write themselves are far better placed to critically evaluate the products of AI (surely a constitutive element of “mastery”).
Do it all live. 3h exams didn’t kill anyone in the past century and they won’t in the one to come. More importantly, start this with K-12 education where (in my country at least) writing abilities have been worsening since long before AI.
There is real value in forcing your brain to do ALL the work itself, and there is no substitute for doing that real work. I had to do it that way and if AI had existed before I can’t imagine how much lazier I would have been, how much less I’d have learned, even if I would likely have been able to produce comparable outputs.
Now I get to enjoy the benefits of AI having actually learned how hard it is to produce the material that AI is trained on in the first place, and this seems to be a far better outcome for today’s youngsters as well.
Entirely agree. A ten minute viva is too short to test and grade a student's ability. It is also a lot more open to abuse/subjectivity. As a minimum, two academics would need to attend each viva and agree a mark. But how would an external examiner then moderate the results, unless each viva is also recorded, so that a sample can be checked? If AI competence is seen as an important skill to acquire - I guess it is, although I suspect it's not that hard to do - then have a separate module to test this skill.
We have been using Zoom-based viva exams in my School for some time now, not just in response to AI, but increased because of it. However, I now believe that online viva/oral interviews/exams are compromised in terms of academic integrity. I have personally invigilated and spoken to students who have used AI in viva exams by running their LLM app in oral input mode while it leans against their laptop screen, so that my verbal questions are being read directly into the app, and then the student reads off the LLM-generated response in real-time. The only real give-away is the lack of hesitations, umms and ahhs. Students can pre-load their LLM-generated essay into the app also, so that questions relating to an essay they have been tasked to write can be easily answered, as well as knowledge type exam questions.
This is not a solution to the big problems you noted. But one small tweak I have made is to require students to write all their papers in GoogleDocs because I can then look at the history of the document. That is often revealing of the writing (or lack of writing!) process.
In my field at least (history) using ai generally means not doing the work. If you haven’t actually read the primary sources yourself you are not doing the work and not learning. Therefore your stated logic goes out the window. Even if you are right and forbidding ai just reduces usage from 80% to 40-60% I’d argue that’s already a much better situation - double to ripple the number of students are now learning something !
I’d say oral exams are a must, as are written in class exams. If you do assign papers made at home I think they are only worthwhile if you forbid ai use. Sure people will cheat but the fact that the honest students will not does not penalize them- it rewards them because they will actually be learning!
P.S.
The car -runner analogy is great but the upshot is totally different. If we understand that we run for the health benefits (learning) and not the incidental outcome of changing location (an essay) we realize that any use of cars in the athletic race is a total waste of time. Forbidding its use is the only way to go.
Im at a similar juncture as you: before GPT I liked the idea of take home assessments (often programming projects in my courses), and then this last year I’ve been on leave so I haven’t had to update everything. But now I think it’s time to do a serious update in the age of much better models with long context windows.
I do like this proposal. It might depend on the subject, but I think some combination of traditional closed book exams, take home assessments, and oral exams seems like a good approach.
That said, admittedly oral exams are pretty daunting to contemplate for some of the classes here at UC San Diego (over 300-400 students). It raises the question of whether TAs could/should help administer these and the tradeoffs with that.
You write "We risk producing graduates who can submit excellent work while understanding very little." This leads me to wonder what value-added to an employer is there in someone who understands the material vs. someone who knows how to ask AI to generate excellent work. From that point of view, your proposed assessment method rates students on their ability to *pitch* the intellectual work they've done, which is a skill that is not precisely the sort of intelligence which *generates* intellectual work.
Switching to the very big picture, during much of the agrarian and industrial eras, a lot of economically productive work was human muscles moving mass. This is infamous for creating economic inequality between the sexes. In the last half of the 20th century, that sort of work has become nearly completely automated, devaluing physical strength. This aids gender equality, but more subtly it eliminates the cultural value placed on being a "big man", and has drastically increased the economic and cultural value placed on the intelligence, the cognitive skills measured by IQ.
But it's not at all clear to me that the economic/cultural/political dominance of the high-IQ class is going to last much longer. (And being decidedly a member of that class, I'm starting to be glad that I recently retired.) It may well be that in 20 years, what humans are paid well to do may be something completely different.
Now I recall something I realized once about education "back in the day" of circa 1840, specifically the novel Tom Sawyer. The novel talks about the buzzing sound of the students studying, and I later integrated this with the fact that small group classes at my alma mater were called "recitations" -- for a long time, education was optimized for a world in which books were scarce and expensive, so the baseline of a lot of education was *memorizing* passages from books so that one would have the facts to hand.
Scrolling forward, books became cheaper and cheaper, to the point that simply memorizing facts was no longer economically valuable, and education started emphasizing "understanding", complex domain-specific reasoning processes, which were still economically valuable. (But human labor being freed up from just memorizing facts, more was available to do "understanding" tasks.)
Scrolling forward to now, what happens if complex, but known, domain-specific reasoning process can be done by AI? What do humans do next that is economically valuable, and how to we train them for that?
Not a professor. But just a note that it seems likely that AI will gain the ability to create convincing deepfakes where it appears your students are answering questions off-the-cuff on videochat but it's actually AI writing what they're "saying."
Also, there have only ever been three reasons students ever spent hours "wrestling with the knottiest problems and complex ideas:"
1. They enjoyed it.
2. They weren't smart enough to rote memorize and then fake deep understanding.
3. Number two, plus they were too principled to cheat.
Very much agree with keeping things in perspective here. In most of my classes - at both high quality state schools and fancy private universities - a majority of students learned by rote and didn’t gain much deeper understanding at all (as far as I could tell).
But when you’re motivated to learn - LLMs actually make it easier! Upload your readings, and suddenly you have a decent quality private tutor who you can bounce ideas off of, who can correct your more obvious misunderstandings, etc.
For quite a while, the goal of much modern education has been to get students to the point where they can tell if the computer is giving them the wrong answer. LLM-based AI just makes this more obvious.
The trivial example of a reason why a computer might give a wrong answer is a typo (you hit the wrong button on the calculator), but there are lots of other reasons, too, such as using a mathematical model in a situation where its assumptions don't hold. For example, if you take a sophisticated circuit simulation program used by electrical engineers and ask it to calculate the current you'll get if you put a million volts across a 1 Ohm resistor, it will use Ohm's Law to tell you that you'll have a million Amperes of current - but if you tried anything like that in real life, you'll soon have a charred mess instead of a resistor and end up with *zero* current!
Somewhat more efficient as well as capable of assessing a broader range of competencies: the group oral exam. Say 5 randomly selected students have to record themselves meeting in person for 45 minutes and discussing a general topic announced <24 hours prior. This can be assessed for course-relevant knowledge; for intellectual skills like taking a position, making arguments, conceding complexities, making and responding to objections; and for leadership skills such as opening a new question, inviting silent members to participate, and collaboratively helping another develop their position; with penalties for dominating and other bad behaviors.
This can be graded relatively quickly with a rubric, at least at a coarse level, and lack of preparation shows up clearly. Still hard to scale up to really large courses, however, and the competencies being assessed have to be introduced and developed throughout the course.
“If you copy and paste my textbook on international development (which includes a breadth of theories, evidence, and references), then pose my assessment questions, the answers are excellent!”
This is a drum I’ve been banging for a while now. People love to rag on the mistakes LLMs make when given no info beyond their training data. Sure, they are still unreliable at that (though much better already than they were a year ago). But they can read hundreds of pages in less than a second, and then do just about anything you ask with that information. This makes them *dramatically* more useful than most people seem to realize.
“Prohibiting AI proves counter-productive for several reasons . . . Students may be disadvantaged if they do not use the latest technologies”
As a law student whose grades are wholly determined by how I stack up against my peers, this has definitely run through my head (though obviously I adhere to all honor codes re AI use!).
But even for non-competitive work, not using AI disadvantages you relative to your best possible work. AI can help you brainstorm. It can point out errors in your thinking. I feel I’ve not only produced better work, but learned more, as a result of using AI this way.
I feel like grading is becoming theater. A performative activity focused on countering the capabilities of AI, or whatever technology is enabling plagiarism. It's an arms race that never ends, so perhaps the best approach is not to play. If we can't truly know if the grades we are giving are justified then maybe we should just move on and take some of the energy formerly spent on grading and devote it to trying to be better teachers. It's okay to admit that a thing has become impossible if it has indeed become impossible.
Higher education in general is kabuki, an elaborate, expensive, time-wasting ritual aimed at providing a credential demonstrating that the bearer is suitable for a desk job. Grade inflation has rendered Ph. D.s meaningless.
If we are to progress, higher ed should be limited to people with IQs over 145, so that those people can get on with exploring the frontiers and solidifying the structure of knowledge, without having to waste their days explaining things to idiots with a mere 130 IQ.
The oral explanation is a really great idea! At least for now...
I work in AI and I don't believe we'll get to super intelligence only with transformer based LLMs. But I also don't think we're likely *that* many years from this all just being vanity projects clinging to the past. As in, the AI will be better than all of us, not just in the classroom but in the workplace.
I've already seen that tech startups aren't hiring anyone junior because a few founder and first hire senior engineers guiding the current models can replace a dozen junior engineers with Claude or GPT while still doing their own job. So in 10 years how many senior engineers will there be?!Then what do we do? These questions are coming for all of us 😬
I think the 19th-century paintings are better to look at than the AI-generated images ;)
It isn’t hard to become AI literate. Loads of my cohort (~10 years out of undergrad) are doing just fine at it. Allowing use of AI to draft and then thinking that a 10 minute oral interview will suffice to separate the wheat from the chaff seems like a waste. Moreover, students who actually do the hard work of learning to write themselves are far better placed to critically evaluate the products of AI (surely a constitutive element of “mastery”).
Do it all live. 3h exams didn’t kill anyone in the past century and they won’t in the one to come. More importantly, start this with K-12 education where (in my country at least) writing abilities have been worsening since long before AI.
There is real value in forcing your brain to do ALL the work itself, and there is no substitute for doing that real work. I had to do it that way and if AI had existed before I can’t imagine how much lazier I would have been, how much less I’d have learned, even if I would likely have been able to produce comparable outputs.
Now I get to enjoy the benefits of AI having actually learned how hard it is to produce the material that AI is trained on in the first place, and this seems to be a far better outcome for today’s youngsters as well.
Entirely agree. A ten minute viva is too short to test and grade a student's ability. It is also a lot more open to abuse/subjectivity. As a minimum, two academics would need to attend each viva and agree a mark. But how would an external examiner then moderate the results, unless each viva is also recorded, so that a sample can be checked? If AI competence is seen as an important skill to acquire - I guess it is, although I suspect it's not that hard to do - then have a separate module to test this skill.
We have been using Zoom-based viva exams in my School for some time now, not just in response to AI, but increased because of it. However, I now believe that online viva/oral interviews/exams are compromised in terms of academic integrity. I have personally invigilated and spoken to students who have used AI in viva exams by running their LLM app in oral input mode while it leans against their laptop screen, so that my verbal questions are being read directly into the app, and then the student reads off the LLM-generated response in real-time. The only real give-away is the lack of hesitations, umms and ahhs. Students can pre-load their LLM-generated essay into the app also, so that questions relating to an essay they have been tasked to write can be easily answered, as well as knowledge type exam questions.
This is not a solution to the big problems you noted. But one small tweak I have made is to require students to write all their papers in GoogleDocs because I can then look at the history of the document. That is often revealing of the writing (or lack of writing!) process.
In my field at least (history) using ai generally means not doing the work. If you haven’t actually read the primary sources yourself you are not doing the work and not learning. Therefore your stated logic goes out the window. Even if you are right and forbidding ai just reduces usage from 80% to 40-60% I’d argue that’s already a much better situation - double to ripple the number of students are now learning something !
I’d say oral exams are a must, as are written in class exams. If you do assign papers made at home I think they are only worthwhile if you forbid ai use. Sure people will cheat but the fact that the honest students will not does not penalize them- it rewards them because they will actually be learning!
P.S.
The car -runner analogy is great but the upshot is totally different. If we understand that we run for the health benefits (learning) and not the incidental outcome of changing location (an essay) we realize that any use of cars in the athletic race is a total waste of time. Forbidding its use is the only way to go.
Im at a similar juncture as you: before GPT I liked the idea of take home assessments (often programming projects in my courses), and then this last year I’ve been on leave so I haven’t had to update everything. But now I think it’s time to do a serious update in the age of much better models with long context windows.
I do like this proposal. It might depend on the subject, but I think some combination of traditional closed book exams, take home assessments, and oral exams seems like a good approach.
That said, admittedly oral exams are pretty daunting to contemplate for some of the classes here at UC San Diego (over 300-400 students). It raises the question of whether TAs could/should help administer these and the tradeoffs with that.
You write "We risk producing graduates who can submit excellent work while understanding very little." This leads me to wonder what value-added to an employer is there in someone who understands the material vs. someone who knows how to ask AI to generate excellent work. From that point of view, your proposed assessment method rates students on their ability to *pitch* the intellectual work they've done, which is a skill that is not precisely the sort of intelligence which *generates* intellectual work.
Switching to the very big picture, during much of the agrarian and industrial eras, a lot of economically productive work was human muscles moving mass. This is infamous for creating economic inequality between the sexes. In the last half of the 20th century, that sort of work has become nearly completely automated, devaluing physical strength. This aids gender equality, but more subtly it eliminates the cultural value placed on being a "big man", and has drastically increased the economic and cultural value placed on the intelligence, the cognitive skills measured by IQ.
But it's not at all clear to me that the economic/cultural/political dominance of the high-IQ class is going to last much longer. (And being decidedly a member of that class, I'm starting to be glad that I recently retired.) It may well be that in 20 years, what humans are paid well to do may be something completely different.
Now I recall something I realized once about education "back in the day" of circa 1840, specifically the novel Tom Sawyer. The novel talks about the buzzing sound of the students studying, and I later integrated this with the fact that small group classes at my alma mater were called "recitations" -- for a long time, education was optimized for a world in which books were scarce and expensive, so the baseline of a lot of education was *memorizing* passages from books so that one would have the facts to hand.
Scrolling forward, books became cheaper and cheaper, to the point that simply memorizing facts was no longer economically valuable, and education started emphasizing "understanding", complex domain-specific reasoning processes, which were still economically valuable. (But human labor being freed up from just memorizing facts, more was available to do "understanding" tasks.)
Scrolling forward to now, what happens if complex, but known, domain-specific reasoning process can be done by AI? What do humans do next that is economically valuable, and how to we train them for that?
Not a professor. But just a note that it seems likely that AI will gain the ability to create convincing deepfakes where it appears your students are answering questions off-the-cuff on videochat but it's actually AI writing what they're "saying."
Also, there have only ever been three reasons students ever spent hours "wrestling with the knottiest problems and complex ideas:"
1. They enjoyed it.
2. They weren't smart enough to rote memorize and then fake deep understanding.
3. Number two, plus they were too principled to cheat.
Also, have you talked with Alberto about this? https://www.thealgorithmicbridge.com/
Very much agree with keeping things in perspective here. In most of my classes - at both high quality state schools and fancy private universities - a majority of students learned by rote and didn’t gain much deeper understanding at all (as far as I could tell).
But when you’re motivated to learn - LLMs actually make it easier! Upload your readings, and suddenly you have a decent quality private tutor who you can bounce ideas off of, who can correct your more obvious misunderstandings, etc.
For quite a while, the goal of much modern education has been to get students to the point where they can tell if the computer is giving them the wrong answer. LLM-based AI just makes this more obvious.
The trivial example of a reason why a computer might give a wrong answer is a typo (you hit the wrong button on the calculator), but there are lots of other reasons, too, such as using a mathematical model in a situation where its assumptions don't hold. For example, if you take a sophisticated circuit simulation program used by electrical engineers and ask it to calculate the current you'll get if you put a million volts across a 1 Ohm resistor, it will use Ohm's Law to tell you that you'll have a million Amperes of current - but if you tried anything like that in real life, you'll soon have a charred mess instead of a resistor and end up with *zero* current!
Incidentally, Meta AI also gave the "1 million amperes" answer until I pressed further.
Somewhat more efficient as well as capable of assessing a broader range of competencies: the group oral exam. Say 5 randomly selected students have to record themselves meeting in person for 45 minutes and discussing a general topic announced <24 hours prior. This can be assessed for course-relevant knowledge; for intellectual skills like taking a position, making arguments, conceding complexities, making and responding to objections; and for leadership skills such as opening a new question, inviting silent members to participate, and collaboratively helping another develop their position; with penalties for dominating and other bad behaviors.
This can be graded relatively quickly with a rubric, at least at a coarse level, and lack of preparation shows up clearly. Still hard to scale up to really large courses, however, and the competencies being assessed have to be introduced and developed throughout the course.
Oral best, but….time???
Maybe Zoom with small groups for interesting interactive discussion?
“If you copy and paste my textbook on international development (which includes a breadth of theories, evidence, and references), then pose my assessment questions, the answers are excellent!”
This is a drum I’ve been banging for a while now. People love to rag on the mistakes LLMs make when given no info beyond their training data. Sure, they are still unreliable at that (though much better already than they were a year ago). But they can read hundreds of pages in less than a second, and then do just about anything you ask with that information. This makes them *dramatically* more useful than most people seem to realize.
“Prohibiting AI proves counter-productive for several reasons . . . Students may be disadvantaged if they do not use the latest technologies”
As a law student whose grades are wholly determined by how I stack up against my peers, this has definitely run through my head (though obviously I adhere to all honor codes re AI use!).
But even for non-competitive work, not using AI disadvantages you relative to your best possible work. AI can help you brainstorm. It can point out errors in your thinking. I feel I’ve not only produced better work, but learned more, as a result of using AI this way.
I feel like grading is becoming theater. A performative activity focused on countering the capabilities of AI, or whatever technology is enabling plagiarism. It's an arms race that never ends, so perhaps the best approach is not to play. If we can't truly know if the grades we are giving are justified then maybe we should just move on and take some of the energy formerly spent on grading and devote it to trying to be better teachers. It's okay to admit that a thing has become impossible if it has indeed become impossible.
Higher education in general is kabuki, an elaborate, expensive, time-wasting ritual aimed at providing a credential demonstrating that the bearer is suitable for a desk job. Grade inflation has rendered Ph. D.s meaningless.
If we are to progress, higher ed should be limited to people with IQs over 145, so that those people can get on with exploring the frontiers and solidifying the structure of knowledge, without having to waste their days explaining things to idiots with a mere 130 IQ.
Verbal/oratory style stream of consciousness style assessments.
Are all the references in the final essay real?
Yes! I checked them! And know them. It's a good essay!!
The tech has massively improved and educators will have to grapple with that going forward!
Yah!! And that's why I wrote this piece. I hope it's useful. But please don't sue if it prompts need for expensive anti-depressants!
Just a few months ago references were made up.