Encyclopedia Britannica and Merriam-Webster Sue ChatGPT Makers for 'Massive Copyright Infringement'

OpenAI has been accused of using thousands of copyrighted works to train ChatGPT's large language model.

A row of Encyclopaedia Britannica volumes on a shelf, with black covers and gold lettering.
Mario Tama via Getty Images

Encyclopedia Britannica and Merriam-Webster will be taking ChatGPT’s parent company, OpenAI, to court for alleged “massive copyright infringement.”

In court documents reviewed by Complex, Merriam-Webster, Inc., and its parent company, Encyclopedia Britannica, Inc., accused OpenAI and its connected entities of using almost 100,000 copyrighted articles to train ChatGPT’s large language model (LLM). The lawsuit alleges that ChatGPT-based products are using Britannica’s copyrighted content and “cannibalizing traffic” to its websites with “AI-generated summaries.”

“ChatGPT starves web publishers like [Britannica] of revenue by generating responses to users’ queries that substitute, and directly compete with, the content from publishers like [Britannica],” reads the lawsuit. “To build its substitute products, [OpenAI] engage in massive copying of [Britannica’s] and other web publishers’ copyrighted content without authorization or remuneration. Upon information and belief, ChatGPT has copied, and continues to copy, [Britannica’s] copyrighted content at massive scale—including both to train the LLM models that power ChatGPT and to supplement or ground that LLM’s knowledge base.”

Additionally, the lawsuit accuses OpenAI of falsely attributing information to Britannica and its other companies. They argue that the company’s AI technology “misleadingly omits portions of [Britannica’s] content without disclosing those omissions and displays the inaccurate reproductions alongside [Britannica’s] famous trademarks.”

OpenAI states that its LLMs are trained using information provided by third parties, information shared by users and researchers, and most crucially for this lawsuit, “information that is publicly available on the internet.” As noted in the court documents, OpenAI has admitted that its LLMs have been trained on “vast amounts of data from the internet written by humans,” which also opens the doors for factually inaccurate information being spliced with, or even attributed to, Britannica and Merriam-Webster’s copyrighted content.

Britannica is not the first company to legally challenge OpenAI, the leading artificial intelligence company in the United States. Publishers, including the New York Times, Ziff Davis, and various newspapers across the country, are also pursuing legal action against OpenAI.

Stay ahead on Exclusives

Download the Complex App