Professor Shyam Balganesh


speaker Professor Shyam Balganesh

The ingestion process [used to train AI models] involves copying the dataset locally; this raises copyright issues.

Shyamkrishna ("Shyam") Balganesh writes and teaches in the areas of copyright law, intellectual property, and legal theory at Columbia Law School. He has written extensively on understanding how intellectual property and innovation policy can benefit from the use of ideas, concepts, and structures from different areas of the common law, especially private law. His recent work explores the interaction between copyright law and key institutional features of the American legal system. He is also working on a series of articles advancing an account of “legal internalism” that explains the shape and trajectory of legal thinking. Balganesh’s work has appeared in leading law journals, including the Columbia Law Review, Harvard Law Review, Stanford Law ReviewUniversity of Pennsylvania Law Review, and Yale Law Journal. He is also a co-author of sections of the leading copyright law treatise Nimmer on Copyright. In 2017, he was elected a member of the American Law Institute, and since 2015 he has served as an adviser to the Restatement of the Law, Copyright. 

Talks by Professor Shyam Balganesh


related talk AI Models & Copyright Battles
AI Models & Copyright Battles

Large language models (LLMs) are trained on vast, nearly unfathomable amounts of data—data that is now reshaping the very fields from which it was sourced, including literature, journalism, music, and photography. As a result, these models have sparked high-stakes litigation and raised novel legal questions about ownership and intellectual property, both in the AI training process and the output they produce. In this conversation, we explore the intersection of AI training and copyright law with Professor Shyamkrishna (Shyam) Balganesh of Columbia Law School, a prominent legal scholar who has been closely examining these emerging issues.

At the core of the debate is how these models are trained—using vast datasets that combine both copyrighted and public domain material. LLMs ingest this data to absorb patterns that power their ability to generate intelligent responses, yet their reliance on copyrighted works raises concerns about unauthorized use. Professor Balganesh walks us through the technical aspects of how these models are built, explaining the intricacies of data ingestion and why the training process involves copying datasets onto local servers, potentially leading to copyright violations.

The fair use doctrine has emerged as a central argument in the defense of using copyrighted material in AI training, but this defense has its limitations. Professor Balganesh details how the courts are grappling with balancing innovation with intellectual property rights. While AI companies claim their use of copyrighted works falls under fair use, critics argue that fair use cannot “scale” with the models and that the models reproduce creative outputs in ways that violate authors' rights. Shyam examines the boundaries of this argument and where the law may be heading.

These legal questions are playing out in real time, with high-profile cases capturing national attention. Professor Balganesh shares his insights on key lawsuits, including the New York Times’ challenge to OpenAI, the Suno AI music case brought by Universal Music Group, and Getty Images' case against Stable Diffusion. While these cases remain pending at the time of the interview, Shyam predicts a shift towards increased licensing regimes, where AI developers will secure permissions to use copyrighted material for training their models.