Publications

Thomson Reuters v. Ross Intelligence: AI Copyright Law and Fair Use on Trial

December 15, 2023 Articles
The Recorder

On Sept. 25, 2023, Judge Stephanos Bibas (sitting by designation in the District of Delaware), determined that fact questions surrounding issues of fair use and tortious interference required a jury to decide media conglomerate Thomson Reuters’s lawsuit against Ross Intelligence, a legal-research artificial intelligence startup. Thomson Reuters, which owns legal research platform WestLaw, alleges that Ross infringed its copyright by illegally copying WestLaw’s short summaries of points of law that appear in judicial opinions (i.e., headnotes).

In recent months, technology companies have weathered lawsuits from authors, artists, programmers, and more traditional media companies. These plaintiffs argue that the use of their work to train generative AI software is copyright infringement. For example, at least two groups of authors allege that large technology companies used their work to train large language model (LLM) chatbots. In another instance, artists allege that Stability AI used the artists’ works to train Stability AI’s text-to-image generator to create AI-generated images “in the style of” those artists, referring to works that others would accept as works created by those artists. Similarly, Getty Images has initiated a copyright infringement suit against Stability AI for using Getty’s photos to train and create AI-generated images.

In yet another lawsuit, a programmer and lawyer allege that their copyrighted source code was scraped to train an AI code-generator. A common defense for these technology companies is that the use of existing writing, art, photography, and code to train generative AI systems is fair use of copyrighted work.

Case Background (Using Legal Memos as Machine-Learning Training Data)

WestLaw has a registered copyright on “original and revised text and compilation of legal material,” which includes its headnotes and key number system. Ross Intelligence, a legal research startup, sought to create a search engine that would produce direct quotations from judicial opinions upon entry of a natural language question. After unsuccessfully attempting to acquire a license to use WestLaw’s legal material to train its search engine, Ross Intelligence retained a third party to create memos with legal questions and answers; that third party then did so using a text-scraping bot. Ross then converted the memos into usable machine-learning training data. Thomson Reuters contends that the 25,000 questions were essentially WestLaw headnotes. Ross admits that the headnotes “influenced” the questions but that lawyers ultimately drafted them.

Relying on 2,830 questions that it contended Ross’s own expert admitted were copied, Thomson Reuters moved for summary judgment on its copyright infringement claim and its tortious interference with contract claims. Both sides moved for summary judgment on Ross’s fair-use defense, and Ross moved for summary judgment on its preemption defense to Thomson Reuters’s tortious interference claims.

Judge Bibas denied Thomson Reuters’s motion for summary judgment on its copyright infringement claim, finding that a dispute of fact existed over how closely WestLaw’s headnotes resemble uncopyrightable judicial opinions, as well as over whether Ross’s questions were substantially similar to WestLaw’s headnotes. Judge Bibas also denied both sides’ motions for summary judgment on Ross’s fair use defense. On Thomson Reuters’s tortious interference claims, Judge Bibas determined that while one claim was preempted, the other two must go to a jury. Finally, Judge Bibas entered summary judgment for Thomson Reuters on a number of Ross’s affirmative defenses.

Key Takeaway Concerning Fair Use

Critically, in denying the parties’ cross-motions for summary judgment on Ross’s fair use defense, the court concluded that the purpose and character of Ross’s use of WestLaw’s headnotes must be determined by contested facts. Ross had argued that intermediate copying caselaw is applicable and supports a finding of fair use. Such caselaw holds that users who copy material to discover unprotectable information or as a “minor step” in creating a new product are engaging in fair use. However, the court reasoned that whether such caselaw was applicable depended on a disputed fact: Did Ross’s AI only use WestLaw headnotes to learn language patterns such that its search engine can produce judicial opinion quotes, or does its AI use the headnotes to replicate the creative expression of WestLaw’s attorney editors in drafting those headnotes?

The court opined that if Ross’s AI only studied language patterns in WestLaw headnotes to learn how to produce judicial opinion quotes, which are unprotected, then such use was “transformative.” (As the court later noted, in assessing the substantiality of Ross’s copying, “if Ross’s AI works the way that it says, it is likely fair use because it produces only the opinion, not the original expression.”) However, the court acknowledged that if Thomson Reuters is correct that Ross trained its AI on WestLaw headnotes in order to replicate the creative drafting of those headnotes, then Ross’s copying was not merely an “intermediate” “minor step” toward a transformative use. Importantly, the court observed that “[h]ow Ross’s AI works and what output it produces remain disputed.”

The court’s unwillingness to choose between these dueling positions has implications for lawsuits challenging the use of copyrighted material to train generative AI systems. Indeed, as noted above, there are currently other technology companies defending AI copyright suits by relying on the fair use defense. Many of these technology companies have submitted public comments in response to the U.S. Copyright Office’s August 2023 notice of inquiry on copyright and AI, staking out a position that the use of copyrighted materials is for analysis of statistical relationships (e.g., between words and how they are used in writing), much like the act of reading a book and learning the facts and ideas within it.

In short, many more jury trials will be required if judges must refrain from deciding whether the purpose of a generative AI system’s use of copyrighted material to learn language patterns is to produce a new product or to replicate the creative expression of the copyrighted material. Here, the output of Ross’s AI remains disputed in part because its output so closely resembles the copied WestLaw headnotes (which themselves closely resemble the judicial opinions analyzed).

However, perhaps judges may be more inclined to decide the purpose of a generative AI system’s use of copyrighted material to conduct statistical analysis when the output of the AI is markedly different, like the amalgamations of a text-to-image generator. In any case, companies developing generative AI systems, particularly those that train on material scraped from the internet, should be careful to emphasize (and consistently document) that the purpose of their systems’ machine learning is not to copy the creative expression of the scraped material, but that the training is an intermediate minor step toward creating something wholly new.

Reprinted with permission from the December 15, 2023 issue of The Recorder. © 2023 ALM Media Properfies, LLC. Further duplication without permission is prohibited. All rights reserved.