When a judge calls out a fake case citation in a federal courtroom, a certain kind of humiliation takes place. It’s not the gavel-slamming, theatrical kind. The reading of a case name that doesn’t exist, the explanation of a decision that was never made, and the revelation that the authority being relied upon was created by a computer program that didn’t realize it was acting strangely are all quieter.
In June 2023, Judge Kevin Castel in the Southern District of New York discovered six court opinions that had never been published while reviewing the brief in Mata v. Avianca. The two lawyers who filed them, Steven Schwartz and Peter LoDuca of Levidow, Levidow & Oberman, used ChatGPT for their research and neglected to verify the authenticity of the cases it mentioned. Each was subject to a $5,000 fine and had to get in touch with each judge whose name had been included in the fake decisions. It was viewed as an anomaly by the legal community. It wasn’t.
Important Information
| Field | Details |
|---|---|
| The Case That Started It | Mata v. Avianca, Inc. (S.D.N.Y. 2023) — attorneys Steven Schwartz and Peter LoDuca submitted a brief containing six entirely fabricated court cases generated by ChatGPT; Judge Kevin Castel fined each $5,000 and required them to personally notify judges whose names appeared in fake opinions |
| Scale of the Problem (2026) | Researcher Damien Charlotin’s hallucination database — tracked over 1,227 AI-fabricated citation incidents in courts worldwide; by late 2025, new cases arriving at two to three per day |
| Most Severe 2026 Sanction | Sixth Circuit, Whiting v. City of Athens, Tennessee (March 2026) — two attorneys fined $15,000 each plus full reimbursement of opposing party’s fees for submitting briefs with over two dozen fake or misrepresented citations |
| New Orleans, 2026 | Attorney John Walker fined $1,000 after ChatGPT generated 11 fabricated or mischaracterized case citations in a brief filed June 2025; Walker: “To this day it blows my mind it has that capability. I don’t have a good excuse.” |
| Stanford CodeX Research (2025) | General-purpose LLMs fabricate case citations in approximately 30–45% of legal research responses, depending on query complexity — the more obscure the legal question, the higher the fabrication rate |
| Claude’s Design Difference | Claude’s Constitutional AI training emphasizes acknowledging uncertainty; Claude is more likely to say “I’m not certain this case exists” than to generate a plausible-sounding citation for a nonexistent ruling |
| ChatGPT’s Legal Risk Profile | Designed to be helpful and generate complete, plausible answers — researcher Charlotin noted ChatGPT will produce confident-sounding legal citations rather than admit uncertainty; citation format (case name, volume, reporter, page) follows predictable patterns making fake cases easy to generate convincingly |
| Court Responses | Over 300 federal judges have adopted standing orders or local rules on AI use; some courts require disclosure specifying the exact tool used (“ChatGPT-4” or “Claude”); some require certification that every citation has been independently verified |
| Industry Guidance | Over 35 state bar associations issued formal guidance on AI use in legal practice as of early 2026 |
Researcher Damien Charlotin’s database of AI hallucination cases in courts across the globe contained more than 1,227 entries by the end of 2025. He informed reporters earlier in 2025 that he was monitoring two cases every week. The rate had increased to two or three each day by that autumn. ChatGPT was the main tool used to create these fabrications, but Gemini, Copilot, and occasionally Claude have also been mentioned in sanctions orders. Claude appears the least frequently, and this frequency difference is significant, but it needs careful interpretation.
Unintentionally, the courtroom record has produced one of the most stringent real-world examinations of how various AI systems manage ambiguity in a high-stakes professional setting. A case either exists, with that docket number and that ruling, or it does not. Legal citation is an organized exercise with clear correct and wrong responses.
When an AI system produces a credible-looking citation for a nonexistent instance, it is not being innovative or context-aware; rather, it is failing in a way that is simple to confirm and, increasingly, simple to sanction. More than 300 federal judges had established standing orders by the beginning of 2026 that expressly addressed the use of AI in court documents. Some of these orders required lawyers to identify the precise tool utilized, such as “ChatGPT-4” or “Claude,” rather than merely “AI software.”
A design decision with actual operational ramifications is the cause of the behavior differences across the main models in legal research contexts. ChatGPT has been trained to generate comprehensive, helpful, and self-assured responses. A language model can produce text that appears exactly like a legitimate citation even in the absence of any supporting information since legal citation conventions, such as case name, volume, reporter, and page number, follow extremely predictable patterns. The model is unaware that it has created a case.
All it has done is finish the pattern that it was asked to. Claude is more likely to state it cannot validate a case’s existence than to create one that sounds correct since he was taught under Constitutional AI principles, which place a strong focus on expressing doubt and refusing to hallucinate facts when accuracy cannot be proven. Users who seek a definitive response may find this tendency toward honest uncertainty annoying at times. It is the only appropriate conduct in a courtroom.
Depending on the specificity of the inquiry, general-purpose LLMs manufacture case citations in between 30 and 45 percent of legal research responses, according to a 2025 study by Stanford’s CodeX Center. The incidence of falsification increases with the obscurity of the legal question—the kind that a lawyer working on a fresh subject would actually need to investigate.

The majority of legal professionals should be deterred by that figure. It is simply stating that approximately one in three legal research replies from an unreliable AI tool contain fictitious authority. Two Tennessee lawyers were fined $15,000 apiece by the Sixth Circuit in March 2026 after their briefs included over two dozen false or misleading citations from three merged appeals. It was said to be the harshest punishment a judge could administer.
John Walker, a lawyer from New Orleans who was penalized $1,000 in early 2026 after ChatGPT produced 11 false or misrepresented citations in a brief submitted in June of the previous year, expressed his response in a way that likely resonated with more lawyers than acknowledged in public. He told the court, “It still amazes me that it has that capability.” “I don’t have a good excuse.”
Before a language model surreptitiously filled his folder with cases that had never existed, the 69-year-old lawyer had been practicing law for decades. The Fifth Circuit provided its own statement after observing the trend of cases piling up: if an AI’s response appears too good to be true—that is, if the case it uncovered is exceptionally helpful and precisely on point—it is probably not genuine. That is not an AI researcher’s disclaimer. A federal appeals court has issued a warning.
It’s difficult to ignore the fact that the judicial system, which was created with the primary goal of determining the truth, is now the setting in which the issue of AI accuracy is being addressed in the most practical, costly, and public way.