Fair Warning about Fair Use

Legal Disclaimer: I am not an attorney so nothing in this blog should be construed as legal advice. It is not comprehensive and as you’re about to see, this is a nuanced subject that is very fact intensive. If you think your copyright or other intellectual property right has been infringed upon, please consult legal counsel well versed in intellectual property right matters.

Length Disclaimer: This post is long. I mean most of my posts are long but this one is really long. And it doesn’t encompass everything I wanted to talk about or could talk about. It’s just a sort of 101 guide tailored to address authors and AI and the use of books in LLM (and other tech).


The Statute

In general, the Copyright Act of 1976, as amended (the “Act”), governs the intellectual property right of copyright (there are other intellectual property rights such as trademark and patent but copyright is the primary source of conversation when we’re talking about intellectual property (“IP”) related to written works). In 1998, the Copyright Act was amended by the Digital Millenium Copyright Act (“DMCA”). Sometimes, the DMCA is referenced independently, but it is structurally part of the Copyright Act and will not be referred to separately here.

The Act grants to creators a swathe of exclusive rights related to their works, including the right to reproduce and distribute the works, create derivative works based on the work, to sell, lend, or lease copies of the work to the public, to perform the work publicly, and to display the work publicly. See, 17 USCA § 106.

Like almost all laws, however, the Act has exceptions. Many of the exceptions are narrow and extremely specific. For example, a library can reproduce one copy of a book without infringing on a copyright if the purpose is noncommercial, the library is open to the public or researchers not affiliated with the library, and the copy has a notice that it’s been copied under the Act’s exceptions. If they want to expand it to three copies, there are even stricter requirements. See, 17 USCA § 108.

17 USCA § 107 (“Section 107”) is different. This section, which outlines the doctrine of Fair Use, is a bit… murkier. Section 107 is quoted in full below.

Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include–
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.

The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.

17 U.S.C.A. § 107

The Fair Use Factors

The four factors listed in Section 107 (the ones with the numbers before them) are the ones courts evaluate to determine whether a use of a copyrighted work is “fair use” and thus, non-infringing (assuming it’s not an obvious case set forth in Section 107’s preamble such as a teacher making copies for a classroom).

So, which factor matters most?

It depends.

Cartoon image of a green cricket shrugging with a sly smile. 
Image sourced via Pixabay.
Could we have just made rules? I mean sure, but that’s hard and annoying. Plus, how would we keep our fellow lawyers gainfully employed? And give them a career path from partner to judge? <– Politicians.

Both federal appellate courts and the United States Supreme Court have repeatedly stated that fair use cases are fact specific, there are no bright line rules, and the factors should be weighed on a case-by-case basis. See, Am. Geophysical Union v. Texaco, Inc., 802 F. Supp. 1, 21 (S.D.N.Y. 1992), aff’d, 60 F.3d 913 (2d Cir. 1994) (no one factor is dispositive in weighing); Google LLC v. Oracle America, Inc., 141 S. Ct. 1183, 1197 (2021) (some factors may be more important in some cases than in others);  Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 577 (1994) (there are no brightline rules and each case requires an independent case-by-case analysis).

Because of this, the US Copyright Office created a Fair Use Index, a searchable database compiling summaries of major fair use cases by category and type of use.

I’m going to again repeat that I’m not a lawyer. This is not legal advice. It’s a compilation of generalities to give folks a better understanding of how complicated this can get and why, “It’s on the internet” does not necessarily equal fair use. It’s also to help people avoid pitfalls like “I thought I understood copyright and fair use because I did a Google then went and developed some AI.” Again, not legal advice but uh… I just don’t recommend that as a citizen of the world. For reasons I already touched on but also because you might get sued and honestly, getting sued sucks. 0/10 do not recommend.

Photo of a typewriter with a piece of paper with the words copyright claim.
Image sourced via Unsplash.
You’ve been served. With a fair warning about fair use. Which is definitely not legal advice. Something I can’t give because I’m not a lawyer. I’m going to say that probably four more times this post. Just in case someone didn’t read the whole thing as people are prone to do with legal stuff (and every email I write).

Factor One: Purpose and Character of Use

When someone asks me to “bullet point” legal analysis, I laugh. It simply doesn’t work like that. So in typical “it doesn’t work like that” fashion, there are sub-factors to the factors to consider:

  1. Commercial v. Non-Commercial Use
  2. The so-called “Transformative Use” of the work

Commercial/Non-Commercial Use

Commercial versus non-commercial use is pretty simple. It’s what it sounds like. Is the person using the allegedly infringed upon work profiting off it? There are some interesting exceptions here because (!) we (!) can’t (!) go (!) one (!) paragraph (!) without (!) exceptions (!) Have you gotten the point yet?

But one of the exceptions I want to point out is that a use can be considered commercial if use of the material infringed upon induces someone to purchase something else. See, Compaq Computer Corp. v. Ergonome, Inc., 387 F.3d 403, 409 (5th Cir. 2004) (inclusion of allegedly infringed on book on ergonomic hand positioning included with computer sales induced purchase of computers and reduced potential liability of computer company making use commercial).

In general, a commercial use case will be more likely to be considered infringing than a non-commercial use case but because there are more factors (and sub-factors) which may be given more weight (it depends, after all), commercial use is not dispositive by any means.

Transformative Use

A photo of a monarch butterfly in its chrysalis.
Photo sourced via Unsplash.
*Cracks knuckles* Okay, buckle up, y’all because this one is going to take some work. Actually by the time we get to the end this butterfly might have transformed from its chrysalis into a whole damn butterfly. But this is one of the most important parts for books and AI and all that so stick with me (and the butterfly).

A use is considered transformative if it adds something new to the work so that the new thing has a different function, purpose, or character from the original work. This analysis is about as squishy as it sounds. The case most frequently cited around this concept recently is the Supreme Court Case, Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, 143 S.Ct. 1258 (May 18, 2023). That case goes into great depth about different types of what is and is not transformative. Andy Warhol painting a Campbell’s soup can is transformative. Why? Because despite it being basiaclly an exact replica, the purpose of the soup can label was commercial – to induce people to buy soup through branding. Warhol’s purpose of painting it was to critique consumerism, something totally different. He transformed the work so it had a different purpose and character. However, Warhol’s creation of orange silkscreens based on a photographer’s photo of Prince (the subject of the actual case) were found to not be transformative because the uses by both artist and photographer were commercial and so similar as to not make the derivative Warhol created transformative.

All that to say when two people in similar fields (artists, basically) are creating something for essentially the same reason (commercial or otherwise) and the one copies the other, turning the copy orange isn’t enough to convince a court that it’s been transformed enough to make it new. Green probably doesn’t count, either.

In tech, use of copyrighted works has been found by appellate courts in several circuits to be transformative (but not necessarily non-infringing because again, there are other factors to consider). See, A.V. ex rel. Vanderhye v. iParadigms, LLC, 562 F.3d 630 (4th Cir. 2009) (Archival of student essays in an online database used for plagiarism review is transformative of the original works); Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146 (9th Cir.2007) (Google’s use of thumbnail images in a searchable index is transformative despite not altering the images at all); Authors Guild, Inc. v. HathiTrust, 755 F.3d 87 (2d Cir. 2014) (a digital library’s full-text searchable database of millions of books is transformative). TL;DR: Databases are transformative but again, that does not guarantee they’re not infringing this is only one sub-factor of many.

How these circuits determine whether a work is transformative also, shockingly, depends. The 9th Circuit has suggested the new work should not be in competition with the old one to be considered transformative. The 2nd Circuit has suggested the new work must be creative or perhaps comment on the old work itself (such as in a parody or in Warhol’s soup can). It has backtracked a bit on the comment uh… comment since then. However, all these cases talk about the “author” of the new work, and closely examining authorial intent when determinining a work’s transformative nature.

Fun fact. The current (but rapidly evolving position) of the US Copyright Office is that AI can’t be an author and that human authorship is a prerequisite to copyrights. However, surprising no one at this point, there are now some notable exceptions you can read about here. There is other tricky stuff about the repercussions around that, but I’m going to pass because we are still on factor one, everyone.

Graphic of a blue alien robot waving. 
Image sourced via Pixabay.
Who says this guy can’t be an author? The US Copyright Office, that’s who. And honestly probably for the best, this guy seems like he would write awful books.

Factor Two: Nature of the Copyrighted Work

Sub factors: (1) Creative or factual; (2) Published or unpublished

These are actually not complicated. The more creative a work, the more likely it is to be protected by the Act. Works of fiction are more protectable than nonfiction. Nonfiction is more protectable than say… lists of numbers (yes, that’s literally a thing that can be copyrighted in some cases please don’t ask me how I know I don’t want to talk about it).

Significantly, exploitation of creativity will weigh against the transformative nature of a work (e.g. if a transformative work transforms in a way that exploits the original creative work it’s not given as much “credit” toward fair use). I mean hypothetically that could mean maybe… I don’t know… if you created an AI program that stole a bunch of super cool creative books written by real people then used that to uh… create other books. Maybe you transformed the books into something totally new (data, an algorithm, a software model, hell a new book making a radical comment on the old). But you did it by uh… infringing then used the thing to compete against the orginal thing. I mean I’m not a lawyer but that just doesn’t seem super fair to me.

An unpublished work will also be more easy to protect than a published one, because a published work already got to have its day in the sun, essentially. It had its commercial debut and is now up for potential commentary and critique.

Factor Three: Amount and Sustainability of Portion Used

Sub-factors: (1) Qualitative; and (2) Quantitative

Qualitative

Using one line can be infringing if the one line is the heart of a work. Or it spoils the movie. See, Video Pipeline Inc. v. Buena Vista Home Entm’t, Inc., 342 F.3d 191, 201 (3d Cir. 2003). Using the whole thing can be appropriate if it’s for a purpose permitted under the other factors. It all really depends here. What is the heart of the work? And how does ripping it out damage the work, its reputation, and the author?

Cartoon of a yellow smiley face holding a white gloved hand to its lips in a shushing gesture.
Image via Pixabay.
Spoiler alert! The Titanic sinks. Darth Vader is Luke’s father. That kid can see dead people. Jon Snow… too soon? Now THERE is a sub-factor. When does it cease being a spoiler? Someone tell my ADHD that it doesn’t need to go figure that out.

Quantitative

While a quantitative analysis is easier to understand: how much of the work was copied and used? It’s not applied in an easy-to-understand way. There’s no rule that says “You’re totally fine if you use less than 5% of the total thing.” Because of the other factors. And also because sometimes the quantity isn’t determined based on the total thing but on how much of the thing competes (Google, Inc., 804 F.3d at 223) and sometimes the quantity is determined based on how much of the thing is “relevant.” See, Am. Geophysical Union v. Texaco, Inc., 802 F. Supp. 1, 21 (S.D.N.Y. 1992), aff’d, 60 F.3d 913 (2d Cir. 1994). You can use 1% and infringe if there’s heavy weight given to other factors or use 80% and not infringe if there’s heavy weight given to this factor. So, again, while I’m not handing out legal advice here because I’m not a lawyer it’s just… not hugely advisable to apply bright line rules where there are none. Even if it would be easier.

Listen, I didn’t make the rules. Because if I had, there would uh… be some.

Factor Four: Effect on the Market

Sub-factors: (1) Direct market harm caused by the alleged infringing work; and (2) Harms that may result from other similar infringements in the future

Direct Market Harm

This is basically exactly what it sounds like. It encompasses loss of sales, profits, revenue, royalties, and potential licensing deals. Basically any allegedly infringing use can be seen as having market harm by depriving the copyright owner of sales. See, Bill Graham Archives, 448 F.3d (2nd Cir. 2006). Also included is market harm for markets the work has not yet entered or fully exploited. To prove such harm, the owner of the copyright must prove that (1) such market exist for it to enter; and (2) the copyright owner is likely to or has plans to enter that market.

What cannot be considered as market harm are uses that criticize the work (even if such criticism results in loss of sales, revenue, licensing opportunity, etc.). This is because the Act does not supersede protected First Amendment Rights to free speech.

Future Harm

Courts also take potential future harm into consideration. The concept here is basically, if we allow this one through, we’re setting a precedent for others like it, and what kind of economic impact will that have on the copyright owner?

This one might be important to pay attention to in some of the upcoming LLM cases (e.g. Silverman, et al. v. OpenAI, et al., N.D.Ca. 3:23-cv-03416) because the stakes there on future harm are potentially quite high not only for the authors involved in the lawsuit but for creators everywhere.

Random Other Things

In typical court fashion, there are also some other random things that have been tacked on during the years that courts consider when making fair use determinations. I won’t belabor the nuance because if you’re not asleep already you’re pretty much a hero. The bulleted key points on some bigger more relevant ones are below:

  • The use is consistent with industry practices
  • The use provides a signficant benefit to the public
  • The infringment was knowing and in bad faith

Conclusion

TL;DR: Copyright infringement is bad. Fair use is complicated. And not always the fairest. There are rules but they sorta suck and can change with a light breeze. I’m not a lawyer. If you’re having an issue with your IP or you want to develop AI that uses IP that isn’t yours (including use an open source base that uses data you’re not sure where it came from, call your In Real Life Lawyer).

Most importantly, keep creating.

Image of pretend photo editor that shows text that says "I'm not a Lawyer Count" with a 6 tally. 
Image created using Pixabay.
Hope that was sufficient to cover me.

One thought on “Fair Warning about Fair Use

Leave a comment