tayaresume.blogg.se - Project essay grader

This makes sense, given what we know about ChatGPT’s strengths. I then pasted this essay into ChatGPT and said it was an essay on why we should ban mobile phones. This resulted in some entertaining sentences like this: As with the literature essay, it was so far, so good.īut that was before I took the banning cigarettes model essay and replaced all the mentions of “cigarette” and “smoke” with “mobile phones”.

It gave it a top grade and a nice comment. I took a model essay on why we should ban cigarettes, and asked it to mark it out of eight levels.

While it is relatively straightforward to game ChatGPT for literature essays, it is much harder to game it for pure writing assessments. So is it case closed? Is this just another easily gameable AI system? Not quite. I then pasted this essay into ChatGPT and said it was an essay on the first chapter of Great Expectations, and to mark it out of four levels. This resulted in some entertaining paragraphs like: I then took the Romeo and Juliet essay and replaced all the mentions of Romeo with “Pip”, all the mentions of the Nurse with “Magwitch”, and all the mentions of Romeo and Juliet with Great Expectations. I took a good essay on Romeo and Juliet and asked ChatGPT to mark it out of four levels. So how does it cope with deliberate attempts to game it? It depends… Putting artificial intelligence marking to the test ChatGPT is orders of magnitude more sophisticated than older AI models. It’s about how students respond when they know their work is being marked by AI, and how the AI then responds to that.ġ9 are the distant past in the world of AI. It’s not about how well it does the job to begin with. Will artificial intelligence be a force for good in education?Īnd this, for me, has always been the challenge of AI marking.Could an AI bot be writing students’ homework?.It may well be inconsistent and unwieldy and error-prone, but it will have a backbone of common sense that prevents really egregious and absurd decisions. In a way, the justification for human marking is a bit like the justification for a jury system. But while humans might not be consistent, they can’t be fooled by tricks like writing the same paragraph 37 times. It’s easy for it to be more consistent than humans, because humans are not great at being consistent. This, essentially, is the problem with AI marking. In 2001 a group of researchers found that repeating the same paragraph 37 times was sufficient to fool one popular automated essay marker. Once people know the AI is rewarding length, they can start to game the system. But, of course, correlation is not causation. Many early AI systems rewarded the length of an essay, simply because essay length does tend to correlate with essay quality.

Once students know that AI is marking their essays, they want to know what it rewards and how it rewards it. It’s because of the impact they have on teaching and learning. So why haven’t we all been using AI marking systems ever since? It’s not because they are unreliable. So you could argue that it was more reliable than any individual human. If you took the average marks awarded by a group of human markers and compared them with PEG, PEG agreed with the average more than any individual human marker. Not only that but its marks did tend to correlate quite closely with those of human markers. If you gave it the same essay on two different days, it awarded it the same mark, which is definitely not always the case with human markers. In 1968 Dr Ellis Batten Page developed Project Essay Grade (PEG), an automated essay-marking system. Can artificial intelligence systems mark more accurately than humans? Definitely, and they have been able to do so since the 1960s.