Using an AI for legal research harbours risk of ‘hallucinations’, Adrian Aronsson-Storrier, Helen Hart

There have been several high-profile cases of lawyers in the United States using AI to generate case references for court proceedings, and then discovering that the AI had “hallucinated” the cases, which didn't exist. Hallucinations are one of the limitations of language models like ChatGPT, where the model occasionally generates outputs that are factually incorrect and do not correspond to information in the training data – such as a fake case reference. These hallucinations occur because tools like ChatGPT have no human-like comprehension of our underlying reality – ChatGPT is merely ‘stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning’. The providers of these tools are aware of the risk of hallucination, and OpenAI warns ChatGPT users that while they have safeguards in place, ‘the system may occasionally generate incorrect or misleading information’. As we have discussed, guidance was recently issued for the UK judiciary about the uses and risks of generative AI in courts and tribunals, including the risk that AI “may make up fictitious cases, citations or quotes”. There is also guidance from other bodies such as the Bar Council and Solicitors Regulation Authority, warning about this possibility.

The case of Harber v Commissions for His Majesty’s Revenue and Customs is one of the first UK decisions to have found that cases cited by a litigant were not genuine judgments, but had been created by an AI system such as ChatGPT. The case is a recent First-tier Tribunal Tax Chamber judgment, where a self-represented appellant taxpayer had provided the tribunal with the citations and summaries for nine First-tier Tribunal decisions which supported her position. While these citations and summaries had "some traits that [were] superficially consistent with actual judicial decisions", they were ultimately found not to be genuine case citations.

Here the respondent gave evidence they had checked each of the cases provided by the self-represented appellant against the cases in the tribunal data, using not only the appellants' fictious party names and the year, but also extending the search to several years on either side – but were unable to find such cases. The Tribunal itself also carried out checks for those cases in its own database and within other legal repositories. The litigant in person accepted that it was possible that the cases had been generated by AI, stating that the cases were provided to her by "a friend in a solicitor's office" whom she had asked to assist with her appeal; and had no alternative explanation as to why the cases could not be located on any available database of Tribunal judgments.

The Tribunal ultimately concluded that the cited cases did not exist and had been generated by an AI system such as ChatGPT. The Tribunal accepted that the taxpayer was unaware that the AI cases were fabricated, and ultimately the incident made no difference to her unsuccessful appeal. However, the Tribunal however did address some of the problems of citing invented judgments; endorsing comments made in the highly publicised the US District Court decision Mata v Avianca, where US lawyers submitted non-existent judicial opinions with fake quotes and citations created by the artificial intelligence tool ChatGPT. In Mata the US Court noted:

Many harms flow from the submission of fake opinions. The opposing party wastes time and money in exposing the deception. The Court's time is taken from other important endeavors. The client may be deprived of arguments based on authentic judicial precedents. There is potential harm to the reputation of judges and courts whose names are falsely invoked as authors of the bogus opinions and to the reputation of a party attributed with fictional conduct. It promotes cynicism about the legal profession and the…judicial system. And a future litigant may be tempted to defy a judicial ruling by disingenuously claiming doubt about its authenticity.

Although this was a litigant in person, it seems likely that the Tribunal would have been harder on a lawyer representing a client and there may have been a referral to a regulator such as the SRA. The UK judicial guidance on AI notes that chatbots such as ChatGPT may be the only source of advice or assistance some self-represented litigants receive, and that such may lack the skills to independently verify legal information provided by AI chatbots and may not be aware that they are prone to error. It is therefore likely that lawyers and the UK courts will need to be attentive to the increasing risk that hallucinated material may be relied upon by unrepresented litigants. If you are looking for judicial authorities, ensure that you search reputable databases.

{

We acknowledge that providing fictitious cases in reasonable excuse tax appeals is likely to have less impact on the outcome than in many other types of litigation, both because the law on reasonable excuse is well-settled, and because the task of a Tribunal is to consider how that law applies to the particular facts of each appellant's case. But that does not mean that citing invented judgments is harmless. It causes the Tribunal and HMRC to waste time and public money, and this reduces the resources available to progress the cases of other court users who are waiting for their appeals to be determined.

https://www.bailii.org/uk/cases/UKFTT/TC/2023/TC09010.html