Google says AI systems should be able to mine publishers’ work unless companies opt out, turning copyright law on its head

0x815@feddit.de · 2 years ago

Google says AI systems should be able to mine publishers’ work unless companies opt out, turning copyright law on its head

db0@lemmy.dbzer0.com · 2 years ago

I agree with google, only I go a step further and say any AI model trained on public data should likewise be public for all and have its data sources public as well. Can’t have it both ways Google.

Domi@lemmy.secnd.me · 2 years ago

To be fair, Google releases a lot of models as open source: https://huggingface.co/google

Using public content to create public models is also fine in my book.

But since it’s Google I’m also sure they are doing a lot of shady stuff behind closed doors.

FaceDeer@kbin.social · 2 years ago

Copyright law already allows generative AI systems to scrape the internet. You need to change the law to forbid something, it isn’t forbidden by default. Currently, if something is published publicly then it can be read and learned from by anyone (or anything) that can see it. Copyright law only prevents making copies of it, which a large language model does not do when trained on it.

maynarkh@feddit.nl · 2 years ago

A lot of licensing prevents or constrains creating derivative works and monetizing them. The question is for example if you train an AI on GPL code, does the output of the model constitute a derivative work?

If yes, Github Copilot is illegal as it produces code that should comply to multiple conflicting license requirements. If no, I can write some simple AI that is “trained” to regurgitate its output on a prompt, and run a leaked copy of Windows through it, then go around selling Binbows and MSFT can’t do anything about it.

The truth is mostly between the two, this is just piracy, which always has been a gray area because of the difficulty of prosecuting it, previously because the perpetrators were many and hard to find, now it’s because the perpetrators are billion dollar companies with expensive lawyer teams.

FaceDeer@kbin.social · 2 years ago

The question is for example if you train an AI on GPL code, does the output of the model constitute a derivative work?

This question is completely independent of whether the code was generated by an AI or a human. You compare code A with code B, and if the judge and jury agree that code A is a derivative work of code B then you win the case. If the two bodies of work don’t have sufficient similarities then they aren’t derivative.

If no, I can write some simple AI that is “trained” to regurgitate its output on a prompt

You’ve reinvented copy-and-paste, not an “AI.” AIs are deliberately designed to not copy-and-paste. What would be the point of one that did? Nobody wants that.

Filtering the code through something you call an AI isn’t going to have any impact on whether you get sued. If the resulting code looks like copyrighted code, then you’re in trouble. If it doesn’t look like copyrighted code then you’re fine.

nous@programming.dev · 2 years ago

If the resulting code looks like copyrighted code, then you’re in trouble. If it doesn’t look like copyrighted code then you’re fine.

^^ Very much this.

Loads of people are treating the process of AI creating works as either violating copyright or not. But that is not how copyright works. It applies to the output of a process not the process itself. If someone ends up writing something that happens to be a copy of something they read before - that is a violation of copy write laws. If someone uses various works and creates something new and unique then that is not a violation. It does not - at this point in time at least - matter if that someone is a real person or an AI.

AI can both violate copy write on one work and not on another. Each case is independent and would need to be legislated differently. But AI can produce so much content so quickly that it creates a real problem for a case by case analysis of copy write infringement. So it is quite likely the laws will need to change to account for this and will likely need to treat AI works differently from human created works. Which is a very hard thing to actually deal with.

Now, one could also argue the model itself is a violation of copyright. But that IMO is a stretch - a model is nothing like the original work and the copyright law also does not cover this case. It would need to be taken to court to really decide on if this is allowed or not.

Personally I don’t think the conversation should be on what the laws currently allow - they were not designed for this. But instead what the laws should allow. So we can steer the conversation towards a better future. Lots of artists are expressing their distaste for AI models to be trained on their works - if enough people do this laws can be crafted to backup this view.

lostmypasswordanew@feddit.de · 2 years ago

An AI model is a derivative work of its training data and thus a copyright violation if the training data is copyrighted.

BlameThePeacock@lemmy.ca · 2 years ago

A human is a derivative work of its training data, thus a copyright violation if the training data is copyrighted.

The difference between a human and ai is getting much smaller all the time. The training process is essentially the same at this point, show them a bunch of examples and then have them practice and provide feedback.

If that human is trained to draw on Disney art, then goes on to create similar style art for sale that isn’t a copyright infringement. Nor should it be.

Phanatik@kbin.social · 2 years ago

This is stupid and I’ll tell you why.
As humans, we have a perception filter. This filter is unique to every individual because it’s fed by our experiences and emotions. Artists make great use of this by producing art which leverages their view of the world, it’s why Van Gogh or Picasso is interesting because they had a unique view of the world that is shown through their work.
These bots do not have perception filters. They’re designed to break down whatever they’re trained on into numbers and decipher how the style is constructed so it can replicate it. It has no intention or purpose behind any of its decisions beyond straight replication.
You would be correct if a human’s only goal was to replicate Van Gogh’s style but that’s not every artist. With these art bots, that’s the only goal that they will ever have.

I have to repeat this every time there’s a discussion on LLM or art bots:
The imitation of intelligence does not equate to actual intelligence.

BlameThePeacock@lemmy.ca · 2 years ago

You’re completely wrong, and I’ll tell you why.

None of what you said matters, perception filters, intent, intelligence… it’s all irrelevant to the discussion.

Copyright infringement only gives certain rights, and at least here in Canada using them to generate a model isn’t one of those. Rights are for things like distribution, reproduction, public performance, communication, and exhibition. US law says you can’t “Prepare derivative works based upon the work.” but the model isn’t a derivative work because it’s not really a work at all, you can’t even visually look at the model. You can’t copyright an algorithm in the US or Canada.

Only the created art should be scrutinized for copyright infringement, and these systems can generate both (just like a human can).

Any enforcement should then be handled when that protected work is then used to infringe on the actual rights of the copyright holder.

Phanatik@kbin.social · 2 years ago

I wasn’t talking about copyright law in regards to the model itself.

I was talking about what is/isn’t grounds for plagiarism. I strongly disagree with the idea that artists and art bots go through the same process. They don’t and it’s reductive to claim otherwise. It negatively impacts the perception of artists’ work to assert that these models can automate a creative process which might not even involve looking at other artists’ work because humans are able to create on their own.

A person who has never looked upon a single painting in their life can still produce a piece but the same cannot be said for an art bot. A model must be trained on work that you want the model to be able to imitate.

This is why ChatGPT required the internet to do what it does (the privacy violation is another big concern there). The model needed vast quantities of information to be sufficiently trained because language is difficult to decipher. Languages evolved by getting in contact with other languages and organically making new words. ChatGPT will never invent a new word because it’s not intelligent, it is merely imitating intelligence.

BlameThePeacock@lemmy.ca · 2 years ago

“A person who has never looked upon a single painting in their life can still produce a piece but the same cannot be said for an art bot. A model must be trained on work that you want the model to be able to imitate.”

No, they really can’t. Go look a 1 year old’s first attempt at “art” because it’s nothing more than random smashing of colour on paper. A computer could easily generate such “work” as well with no training data at all. They’ve seen art at that point, and still can’t replicate it because they need much more training first.

Humans require books (or teachers who read books) to learn how to read and write. That is “vast quantities of information” being consumed to learn how to do it. If you had never seen or heard of a book, you wouldn’t be able to write a novel. It’s also completely ignoring the fact that you had to previously learn the spoken language as well (which is a vast quantity of information that takes a human decades to acquire proficiency in even with daily practice)

frog 🐸@beehaw.org · 2 years ago

Absolutely agreed! I think if the proponents of AI artwork actually had any knowledge of art history, they’d understand that humans don’t just iterate the same ideas over and over again. Van Gogh, Picasso, and many others, did work that was genuinely unique and not just a derivative of what had come before, because they brought more to the process than just looking at other artworks.

nickwitha_k (he/him)@lemmy.sdf.org · 2 years ago

Yup. There seems to be a strong motive in many to not understand this concept as it makes their practices clearly ethically questionable.

frog 🐸@beehaw.org · 2 years ago

My feeling is that the vast majority of pro-AI techbros come from a computer science, finance, or business background; undoubtedly intelligent people, but completely and utterly lacking in any appreciation or understanding of what actually goes into creative work. I’m sure they genuinely believe that there’s no difference between what a human does and what an AI does, because they think art (or writing, music, etc) are just the product of an algorithm.

Phanatik@kbin.social · 2 years ago

Ironically, my background is in mathematics but I also happen to be a writer so I see both sides of the argument. I just see the utter lack of compassion people have for those who produce creative work and the same people believe that if it can be automated, it should be automated.

lostmypasswordanew@feddit.de · 2 years ago

Humans and AI are not the same and an equivalence should never be drawn.

BlameThePeacock@lemmy.ca · 2 years ago

Your feelings don’t really matter, the fact of the matter is that the goal of ai is literally to replicate the function of a human brain. The way we’re building them is often mimicking the same processes.

nickwitha_k (he/him)@lemmy.sdf.org · 2 years ago

And LLMs and related technologies, by themselves, are artificial but not intelligent. So, the facts are not in favor of your argument to allow commercial parasitism on creative works.

BlameThePeacock@lemmy.ca · 2 years ago

I think you’re missing a point here. If someone uses these to models to produce and distribute copyright infringing works, the original rights holder could go after the infringer.

The model itself isn’t infringing though, and the process of creating the model isn’t either.

It’s a similar kind of argument to the laws that protect gun manufacturers from culpability from someone using their weapon to commit a crime. The user is the one doing the bad thing, they just produce a tool.

Otherwise, could Disney go after a pencil company because someone used one of their pencils to infringe on their copyright. Even if that pencil company had designed the pencil to be extremely good at producing Disney imagery by looking at a whole bunch of Disney images and movies to make sure it matches the size, colour, etc? No, because a pencil isn’t a copyright infringement of art, regardless of the process used to design it.

nickwitha_k (he/him)@lemmy.sdf.org · edit-2 2 years ago

Nah. You’re missing the forest for the trees. Let’s get abstract:

Person A makes a living by making product X and selling it.

Person B makes a living by making product Y and selling it.

Both A and B are in the same industry.

Person C uses a machine to extract the essence of product X and Y and blend them. Person C then claims authorship and sells it as product Z, which they sell in competition to X and Y.

Person C has not created anything. Their machine does not have value in the absence of products X and Y, yet received no permission, offers no credit nor compensation. In addition, they are competing for the same customers and harming the livelihoods of A and B. Person C is acting in a purely parasitic manner that cannot be seen as ethical in any widely accepted definition of the word.

50gp@kbin.social · 2 years ago

a human does not copy previous work exactly like these algorithms, whats this shit take?

BlameThePeacock@lemmy.ca · 2 years ago

A human can absolutely copy previous works, and they do it all the time. Disney themselves license books teaching you how to do just that. https://www.barnesandnoble.com/w/learn-to-draw-disney-celebrated-characters-collection-disney-storybook-artists/1124097227

Not to mention the amount of porn online based on characters from copyrighted works. Porn that is often done as a paid commission, expressly violating copyright laws.

ConsciousCode@beehaw.org · 2 years ago

To be honest I’m fine with it in isolation, copyright is bullshit and the internet is a quasi-socialist utopia where information (an infinitely-copyable resource which thus has infinite supply and 0 value under capitalist economics) is free and humanity can collaborate as a species. The problem becomes that companies like Google are parasites that take and don’t give back, or even make life actively worse for everyone else. The demand for compensation isn’t so much because people deserve compensation for IP per se, it’s an implicit understanding of the inherent unfairness of Google claiming ownership of other people’s information while hoarding it and the wealth it generates with no compensation for the people who actually made that wealth. “If you’re going to steal from us, at least pay us a fraction of the wealth like a normal capitalist”.

If they made the models open source then it’d at least be debatable, though still suss since there’s a huge push for companies to replace all cognitive labor with AI whether or not it’s even ready for that (which itself is only a problem insofar as people need to work to live, professionally created media is art insofar as humans make it for a purpose but corporations only care about it as media/content so AI fits the bill perfectly). Corporations are artificial metaintelligences with misaligned terminal goals so this is a match made in superhell. There’s a nonzero chance corporations might actually replace all human employees and even shareholders and just become their own version of skynet.

Really what I’m saying is we should eat the rich, burn down the googleplex, and take back the means of production.

cambriakilgannon@beehaw.org · 2 years ago

Or, if it was some non-profit doing the work for the good of everyone :')

YⓄ乙 @aussie.zone · edit-2 2 years ago

Can we get some young politicians elected who has a degree in IT ? Boomers dont understand technology that’s why these companies keeps screwing the people.

Gyoza Power@discuss.tchncs.de · 2 years ago

With each day I hate the internet and these fucking companies even more.

andresil@lemm.ee · edit-2 2 years ago

Copyright law is gaslighting at this point. Piracy being extremely illegal but then this kind of shit being allowed by default is insane.

We really are living under the boot of the ruling classes.

modulus@lemmy.ml · 2 years ago

Worth considering that this is already the law in the EU. Specifically, the Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market has exceptions for text and data mining.

Article 3 has a very broad exception for scientific research: “Member States shall provide for an exception to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, and Article 15(1) of this Directive for reproductions and extractions made by research organisations and cultural heritage institutions in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access.” There is no opt-out clause to this.

Article 4 has a narrower exception for text and data mining in general: “Member States shall provide for an exception or limitation to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, Article 4(1)(a) and (b) of Directive 2009/24/EC and Article 15(1) of this Directive for reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining.” This one’s narrower because it also provides that, “The exception or limitation provided for in paragraph 1 shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.”

So, effectively, this means scientific research can data mine freely without rights’ holders being able to opt out, and other uses for data mining such as commercial applications can data mine provided there has not been an opt out through machine-readable means.

Queen HawlSera@lemm.ee · 2 years ago

Google is smoking that pack.

Gutless2615@ttrpg.network · 2 years ago

It’s not turning copyright law on its head, in fact asserting that copyright needs to be expanded to cover training a data set IS turning it on its head. This is not a reproduction of the original work, its learning about that work and and making a transformative use from it. An AI isn’t copying the original, its learning about the relationships that original has to the other pieces in the data set.

argv_minus_one@beehaw.org · 2 years ago

This is artificial pseudointelligence, not a person. It doesn’t learn about or transform anything.

acastcandream@beehaw.org · edit-2 1 year ago

spoiler

asdfasdfsadfasfasdf

phillaholic@lemm.ee · 2 years ago

The lines between learning and copying are being blurred with AI. Imagine if you could replay a movie any time you like in your head just from watching it once. Current copyright law wasn’t written with that in mind. It’s going to be interesting how this goes.

ricecake@beehaw.org · 2 years ago

Imagine being able to recall the important parts of a movie, it’s overall feel, and significant themes and attributes after only watching it one time.

That’s significantly closer to what current AI models do. It’s not copyright infringement that there are significant chunks of some movies that I can play back in my head precisely. First because memory being owned by someone else is a horrifying thought, and second because it’s not a distributable copy.

SkepticElliptic@beehaw.org · 2 years ago

How many movies are based on each other? It’s a lot, even if it’s just loosely based on it. If you stopped allowing that then you would run out of new things to do.

jarfil@beehaw.org · edit-2 2 years ago

my head […] not a distributable copy.

There has been an interesting counter-proposal to that: make all copies “non-distributable” by replacing the 1:1 copying, by AI:AI learning, so the new AI would never have a 1:1 copy of the original.

It’s in part embodied in the concept of “perishable software”, where instead of having a 1:1 copy of an OS installed on your smartphone/PC, a neural network hardware would “learn how to be a smartphone/PC”.

Reinstalling, would mean “killing” the previous software, and training the device again.

MachineFab812@discuss.tchncs.de · 2 years ago

Right, because the cool part of upgrading your phone is trying to make it feel like its your phone, from scratch. Perishable software is anything but desirable, unless you enjoy having the very air you breathe sold to you.

phillaholic@lemm.ee · 2 years ago

the thought of human memory being owned is horrifying. We’re talking about AI. This is a paradigm shift. New laws are inevitable. Do we want AI to be able to replicate small creators work and ruin their chances at profitability? If we aren’t careful, we are looking at yet another extinction wave where only the richest who can afford the AI can make anything. I don’t think it’s hyperbole to be concerned.

ricecake@beehaw.org · 2 years ago

The question to me is how you define what the AI is doing in a way that isn’t hilariously overbroad to the point of saying “Disney can copyright the style of having big eyes and ears”, or “computers can’t analyze images”.

Any law expanding copyright protections will be 90% used by large IP holders to prevent small creators from doing anything.

What exactly should be protected that isn’t?

acastcandream@beehaw.org · edit-2 2 years ago

Let me ask you this: do you think our brains and LLM’s are, overall, pretty distinct? This is not a trick or bait or something, I’m just going through this methodically in hopes my position - which is shared by some others in this thread it seems - is better understood.

ricecake@beehaw.org · 2 years ago

I don’t think they work the same way, but I think they work in ways that are close enough in function that they can be treated the same for the purposes of this conversation.

Pen and pencil are “the same”, and either of those and printed paper are “basically the same”.
The relationship between a typical modern AI system and the human mind is like that between a pencil written document and a word document: entirely dissimilar in essentially every way, except for the central issue of the discussion, namely as a means to convey the written word.

Both the human mind and a modern AI take in input data, and extract relationships and correlations from that data and store those patterns in a batched fashion with other data.
Some data is stored with a lot of weight, which is why I can quote a movie at you, and the AI can produce a watermark: they’ve been used as inputs a lot. Likewise, the AI can’t perfectly recreate those watermarks and I can’t tell you every detail from the scene: only the important bits are extracted. Less important details are too intermingled with data from other sources to be extracted with high fidelity.

jarfil@beehaw.org · edit-2 2 years ago

Imagine if you could replay a movie any time you like in your head just from watching it once.

Two points:

These AIs can’t do that; they need thousands or millions of repetitions to “learn” the movie, and every time they “replay” the movie it is different from the original.
“learning by rote” is something fleshbags can do, and are actually required to by most education systems.

So either humans have been breaking the copyright all this time, or the machines aren’t breaking it either.

phillaholic@lemm.ee · 2 years ago

You have one brain. You could have as many instances of AI as you can afford. In a general sense, it’s different, and acting like it’s not is going to hit you like a freight train if you don’t prepare for it.

jarfil@beehaw.org · edit-2 2 years ago

That’s a different goalpost. I get the difference between 8 billion brains, and 8 billion instances of the same AI. That has nothing to do with whether there is a difference in copyright infringement, though.

If you want another goalpost, that IMHO is more interesting: let’s discuss the difference between 8 billion brains with up to 100 years life experience each, vs. just a million copies of an AI with the experience of all human knowledge each.

(That’s still now really what’s happening, which is tending more towards several billion copies of AIs with vast slices of human knowledge each).

LastOneStanding@beehaw.org · 2 years ago

OK, so I shall create a new thread, because I was harassed. Why bother publishing anything if it’s original if it’s just going to be subsumed by these corporations? Why bother being an original human being with thoughts to share that are significant to the world if, in the end, they’re just something to be sucked up and exploited? I’m pretty smart. Keeping my thoughts to myself.

Kwakigra@beehaw.org · 2 years ago

This is a tendency I’ve heard that I haven’t been able to understand. What is the new risk of expressing your thoughts, prose, or poetry online that didn’t exist before and currently exists with LLMs scraping them? How would the corporations exploit your work through data scraping that would demotivate you to express it at all? Because I know tone doesn’t come accross well in text, I want to clarify that these are genuine questions because my answers to these questions seem to be very different than many and I’d like to understand where that difference in perspective comes from.

MrWiggles@prime8s.xyz · edit-2 2 years ago

I think this largely boils down to the time scales required. A person copying your work has a minimum amount of time it takes them to do that, even when it’s just copy and paste. An LLM can copy thousands of different developer’s code, for instance, and completely launder the license. That’s not ok. Why would we allow machines to commit fraud when we don’t allow people to?

AphoticDev@lemmy.dbzer0.com · 2 years ago

Except that isn’t exactly how neural networks learn. They aren’t exactly copying work, they’re learning patterns in how humans make those works in order to imitate them. The legal argument these companies are making is that the results from using AI are transformative enough that they qualify as totally new and unique works, and it looks as if that might end up becoming law, depending on how the lawsuits currently going through the courts turn out.

To be clear, technically an LLM doesn’t copy any of the data, nor does it store any data from the works it learns from.

MrWiggles@prime8s.xyz · 2 years ago

Except, what it produces is very similar or identical to some copyrighted works, licensed under the LGPL, like in this case. You don’t have to copy a whole program to plagiarize someone

acastcandream@beehaw.org · edit-2 1 year ago

spoiler

asdfasdfsadfasfasdf