Is the Copyright Threat to Generative AI Overhyped? Implications of Kadrey v. Meta
In November 2023, Meta successfully had nearly all of the claims against it dismissed in the Kadrey v. Meta Platforms, Inc. suit, a victory with potential implications for other technology companies with generative AI tools. 2023 WL 8039640 (N.D. Cal. Nov. 20, 2023) (commonly referred to as Silverman v. Meta). Notably, Judge Vince Chhabria’s order granting the motion to dismiss rejected the theory that generative language models can themselves constitute infringing derivative works as “nonsensical.” Id., at *1. This is a particularly noteworthy development because technology companies with generative AI tools now have a decision that supports their contention that the language model itself does not infringe the copyright of the materials that the model is trained on.
A group of authors, including Richard Kadrey and Sarah Silverman, brought suit against Meta in the Northern District of California, alleging that the company infringed their copyrights by training its LLaMA AI language models on the authors’ books. See Kadrey v. Meta Platforms, Inc., 23-cv-03417-VC. They brought a slew of claims including direct and vicarious copyright infringement, circumvention of copyright protections in violation of the Digital Millennium Copyright Act (DMCA), unfair competition in violation of California law, negligence, and unjust enrichment.
The plaintiffs had asserted two theories of direct copyright infringement: 1) that Meta infringed their copyrights by training its AI tool on their books; and 2) that Meta’s AI tool is itself an infringing derivative work because it could not function without their books. Judge Chhabria rejected the latter theory. He clarified that a derivative work “is a work based upon one or more preexisting works in any form in which a work may be recast, transformed or adapted,” and that there was no way to understand the LLaMA models themselves are a recasting or adaptation of any of the plaintiffs’ books.” Kadrey, at *1 (citing 17 U.S.C. § 101) (internal quotations omitted). If one were to put one of the LLaMA models beside one of the plaintiff’s books, there would obviously be no similarity between the two. Plaintiffs had sought to argue, by logic, that the LLaMA models must be derivative works because they train on the text from books and retain the expressive information extracted from the books and therefore “cannot function without the expressive information extracted from the plaintiffs’ books. That theory is what Judge Chhabria thought was “nonsensical.”
The plaintiffs also asserted a vicarious copyright infringement claim, arguing that each output of the LLaMA language models constitutes an infringing derivative work and that users’ queries producing the outputs could form the basis of a vicarious liability claim because the output is “based on expressive information extracted from” the plaintiffs’ books. Compl, ¶ 44. This theory failed as well because the plaintiffs failed to allege the contents of any output, “let alone of one that could be understood as recasting, transforming, or adapting the plaintiff’s books.” Kadrey, at *1. Further, the court emphasized that even though the plaintiffs alleged that their books had been duplicated in full as part of the LLaMA training process, they would need to allege that the contents of the LLaMa models’ outputs are “substantially similar” to their books to succeed in showing that the outputs constitute derivative infringement. Id. Without pointing to a single output in their complaint, nor alleging any instance of substantial similarity, the plantiffs fell short of the requisite pleading standard.
Upon dismissing the copyright infringement claims, the plaintiffs’ remaining claims against Meta failed as well. The DMCA claim was dismissed because the plaintiffs had not plausibly alleged distribution of the plaintiffs’ books nor that the LLaMA models constitute infringing derivative works. Id., at *2. Likewise, the unfair competition, unjust enrichment, and negligence claims were each preempted because they relied on the same rights contained in the Copyright Act. Id.
As a result, the author plaintiffs are now left with a sole remaining claim for direct copyright infringement—asserting the theory that Meta directly infringed their copyrights by training its AI models on their books. The plaintiffs have amended their complaint accordingly, pointing to Meta’s own admissions that it copied an online dataset called Books3 that has been known to include the plaintiffs’ books as part of the training process for the LLaMA language models.
Meta’s successful motion to dismiss makes clear that generative AI tools themselves cannot amount to infringing derivative works of the copyrighted material they are trained upon. It does not foreclose the possibility of successful copyright infringement suits based on generative AI tools’ outputs; however, such cases are limited to instances where the plaintiffs can demonstrate substantial similarity between their copyright material and the content of the outputs. This severely limits the range of copyright suits that can be brought against the outputs of generative AI.