AI Models & Copyright Battles

An interview with Professor Shyam Balganesh

Large language models (LLMs) are trained on vast, nearly unfathomable amounts of data—data that is now reshaping the very fields from which it was sourced, including literature, journalism, music, and photography. As a result, these models have sparked high-stakes litigation and raised novel legal questions about ownership and intellectual property, both in the AI training process and the output they produce. In this conversation, we explore the intersection of AI training and copyright law with Professor Shyamkrishna (Shyam) Balganesh of Columbia Law School, a prominent legal scholar who has been closely examining these emerging issues.

At the core of the debate is how these models are trained—using vast datasets that combine both copyrighted and public domain material. LLMs ingest this data to absorb patterns that power their ability to generate intelligent responses, yet their reliance on copyrighted works raises concerns about unauthorized use. Professor Balganesh walks us through the technical aspects of how these models are built, explaining the intricacies of data ingestion and why the training process involves copying datasets onto local servers, potentially leading to copyright violations.

The fair use doctrine has emerged as a central argument in the defense of using copyrighted material in AI training, but this defense has its limitations. Professor Balganesh details how the courts are grappling with balancing innovation with intellectual property rights. While AI companies claim their use of copyrighted works falls under fair use, critics argue that fair use cannot “scale” with the models and that the models reproduce creative outputs in ways that violate authors' rights. Shyam examines the boundaries of this argument and where the law may be heading.

These legal questions are playing out in real time, with high-profile cases capturing national attention. Professor Balganesh shares his insights on key lawsuits, including the New York Times’ challenge to OpenAI, the Suno AI music case brought by Universal Music Group, and Getty Images' case against Stable Diffusion. While these cases remain pending at the time of the interview, Shyam predicts a shift towards increased licensing regimes, where AI developers will secure permissions to use copyrighted material for training their models.


  • Attorney CLE accreditation 

🎧 Listen and Subscribe to the AI Lawyer Podcast

Michelle A. Reed discusses After the Data Breach
After the Data Breach
Michelle A. Reed discusses Cyber Defense: Private Funds & Banks
Cyber Defense: Private Funds & Banks
Michelle A. Reed discusses Cyber Defense: Private Funds & Banks (Part 2)
Cyber Defense: Private Funds & Banks (Part 2)
Ira Kustin and Sherrese Smith discusses Cyber Risk at Private Funds
Cyber Risk at Private Funds
Elena M. Paul discusses Dance as Intellectual Property
Dance as Intellectual Property
Prof. Naomi R. Cahn discusses Digital Asset Planning
Digital Asset Planning
Judge Andrew Napolitano discusses Domestic Spying and the NSA
Domestic Spying and the NSA
James Anderson discusses Driverless Cars—A Shift in Risk
Driverless Cars—A Shift in Risk
Doreen Small and Ali Grace Marquart discusses Fashion Modeling Law: Age, Weight, & Photoshop
Fashion Modeling Law: Age, Weight, & Photoshop
Heather McDonald discusses Fashion Piracy & Anti-Counterfeiting
Fashion Piracy & Anti-Counterfeiting
Prof. Nadine Strossen discusses Free Speech in a Social Media World
Free Speech in a Social Media World
Prof. Jeffrey Rosen discusses Government Surveillance: Privacy & Technology
Government Surveillance: Privacy & Technology
Prof. Neil Richards discusses Human Information Privacy
Human Information Privacy
Prof. Neil Richards discusses Human Information Privacy (Part 2)
Human Information Privacy (Part 2)
Prof. Emily Murphy discusses Memory Evidence (Part 2)
Memory Evidence (Part 2)
Hina Shamsi discusses Military Drones and Targeted Killing
Military Drones and Targeted Killing
Prof. Daniel Capra discusses Police Power and Personal Rights
Police Power and Personal Rights
Prof. Daniel Capra discusses Police Power and Personal Rights (Part 2)
Police Power and Personal Rights (Part 2)
Prof. I. Bennett Capers discusses Police Technology - From Body Cameras to Facial Recognition
Police Technology - From Body Cameras to Facial Recognition
Prof. Amy Gajda discusses Press Freedom vs. Privacy
Press Freedom vs. Privacy
Prof. Amy Gajda discusses Press Freedom vs. Privacy (Part 2)
Press Freedom vs. Privacy (Part 2)
Prof. Joel Reidenberg discusses Privacy & Technology in Today's Schools
Privacy & Technology in Today's Schools
Prof. Jeffrey Rosen discusses Privacy vs. Government Tech
Privacy vs. Government Tech
Chairman Christopher Giancarlo discusses Regulating Cryptocurrency after FTX
Regulating Cryptocurrency after FTX
Scott Skinner-Thompson discusses Sexual Privacy and Government "Outing"
Sexual Privacy and Government "Outing"
Daniel Levy discusses Stolen Art, Forgeries & Nazi Plunder
Stolen Art, Forgeries & Nazi Plunder
John Heitmann and Jameson Dempsey discusses The Internet of Things – The Latest Frontier
The Internet of Things – The Latest Frontier
John Heitmann and Jameson Dempsey discusses The Internet of Things – The Latest Frontier (Part 2)
The Internet of Things – The Latest Frontier (Part 2)
Prof. Eric Goldman discusses The Law of Deplatforming
The Law of Deplatforming