Gentoo AI Policy

wiki.gentoo.org

110 points by simonpure 6 hours ago

Perhaps the most telling portion of their decision is:

  Quality concerns. Popular LLMs are really great at 
  generating plausibly looking, but meaningless content. They 
  are capable of providing good assistance if you are careful 
  enough, but we can't really rely on that. At this point, 
  they pose both the risk of lowering the quality of Gentoo 
  projects, and of requiring an unfair human effort from 
  developers and users to review contributions and detect the 
  mistakes resulting from the use of AI.

The first non-title sentence is the most notable to consider, with the rest providing reasoning difficult to refute.

jjmarr 2 hours ago

I've been using AI to contribute to LLVM, which has a liberal policy.
The code is of terrible quality and I am at 100+ comments on my latest PR.
That being said, my latest PR is my second-ever to LLVM and is an entire linter check. I am learning far more about compilers at a much faster pace than if I took the "normal route" of tiny bugfixes.
I also try to do review passes on my own code before asking for code review to show I care about quality.
LLMs increase review burden a ton but I would say it can be a fair tradeoff, because I'm learning quicker and can contribute at a level I otherwise couldn't. I feel like I will become a net-positive to the project much earlier than I otherwise would have.
edit: the PR in question. Unfortunately I've been on vacation and haven't touched it recently.
https://github.com/llvm/llvm-project/pull/146970
It's a community's decision whether to accept this tradeoff & I won't submit AI generated code if your project refuses it. I also believe that we can mitigate this tradeoff with strong social norms that a developer is responsible for understanding and explaining their AI-generated code.
- totallymike an hour ago
  
  How deliciously entitled of you to decide that making other people try to catch ten tons of bullshit because you’re “learning quicker and can contribute at a level you otherwise couldn’t” is a tradeoff you’re happy to accept
  If unrepentant garbage that you make others mop up at risk of their own projects’ integrity is the level you aspire to, please stop coding forever.
  - jjmarr an hour ago
    
    I didn't make a decision on the tradeoff, the LLVM community did. I also disclosed it in the PR. I also try to mitigate the code review burden by doing as much review as possible on my end & flagging what I don't understand.
    If your project has a policy against AI usage I won't submit AI-generated code because I respect your decision.
    
    h4ny an hour ago
    
    > I didn't make a decision on the tradeoff, the LLVM community did. I also disclosed it in the PR.
    That's not what the GP mean. Just because a community doesn't disallow something doesn't mean it's the right thing to do.
    > I also try to mitigate the code review burden by doing as much review as possible on my end
    That's great but...
    > & flagging what I don't understand.
    It's absurd to me that people should commit code they don't understand. That is the problem. Just because you are allowed to commit AI-generated/assisted code does not mean that you should commit code that you don't understand.
    The overhead to others of committing code that you don't understand then ask someone to review is a lot higher than asking someone for directions first so you can understand the problem and code you write.
    > If your project has a policy against AI usage I won't submit AI-generated code because I respect your decision.
    That's just not the point.
  - noosphr an hour ago
    
    That's no different to on boarding any new contributor. I cringe at the code I put out when I was 18.
    On top of all that every open source project has a gray hair problem.
    Telling people excited about a new tech to never contribute makes sure that all projects turn into templeOS when the lead maintainer moves on.
    
    totallymike an hour ago
    
    Onboarding a new contributor implies you’re investing time into someone you’re confident will pay off over the long run as an asset to the project. Reviewing LLM slop doesn’t grant any of that, you’re just plugging thumbs into cracks in the glass until the slop-generating contributor gets bored and moves on to another project or feels like they got what they wanted, and then moves on to another project.
    I accept that some projects allow this, and if they invite it, I guess I can’t say anything other than “good luck,” but to me it feels like long odds that any one contributor who starts out eager to make others wade through enough code to generate that many comments purely as a one-sided learning exercise will continue to remain invested in this project to the point where I feel glad to have invested in this particular pedagogy.
    
    totallymike 37 minutes ago
    
    Unrelated to my other point, I absolutely get wanting to lower barriers, but let’s not forget that templeOS was the religious vanity project of someone who could have had a lot to teach us if not for mental health issues that were extant early enough in the roots of the project as to poison the well of knowledge to be found there. And he didn’t just “move on,” he died.
    While I legitimately do find templeOS to be a fascinating project, I don’t think there was anything to learn from it at a computer science level other than “oh look, an opinionated 64-bit operating environment that feels like classical computing and had a couple novel ideas”
    I respect that instances like it are demonstrably few and far between, but don’t entertain its legacy far beyond that.
- bestham an hour ago
  
  IMO that is not your call to make, it is the reviews call to make. It is the reviewers resources you are spending to learn more quickly. You are consuming a “free” resource for personal gain because you feel that it is justified in your particular case. It would likely not scale and grind many projects to a halt at least temporarily if this was done at scale.
  - ororroro an hour ago
    
    The decision is made by llvm https://llvm.org/docs/FAQ.html#id4
- jlebar an hour ago
  
  As a former LLVM developer and reviewer, I want to say:
  1. Good for you.
  2. Ignore the haters in the comments.
  > my latest PR is my second-ever to LLVM and is an entire linter check.
  That is so awesome.
  > The code is of terrible quality and I am at 100+ comments on my latest PR.
  The LLVM reviewers are big kids. They know how to ignore a PR if they don't want to review it. Don't feel bad about wasting people's time. They'll let you know.
  You might be surprised how many PRs even pre-LLMs had 100+ comments. There's a lot to learn. You clearly want to learn, so you'll get there and will soon be offering a net-positive contribution to this community (or the next one you join), if you aren't already.
  Best of luck on your journey.
  - jjmarr an hour ago
    
    Thanks. I graduated 3 months ago and this has been a huge help.
- thrownawayohman 26 minutes ago
  
  Ahhahaha what the fuck. This is what software development has become? Using an LLM to generate code that not only do you not understand, but most likely isn’t even correct, and then shoehorn the responsibility of ensuring it doesn’t break anything onto the reviewer? lol wow
29athrowaway an hour ago

LLMs trained on open source make the common mistakes that humans make.
paulcole 2 hours ago

How is it telling at all?
It’s just what every other tech bro on here wants to believe, that using LLM code is somehow less pure than using free-range-organic human written code.
perching_aix 4 hours ago

[flagged]
- AdieuToLogic 3 hours ago
  
  [flagged]
  - johnfn 2 hours ago
    
    But it's also difficult to prove it correct by argument or evidence. "Refute" is typically used in a context that suggests that the thing we're refuting has a strong likelihood of being true. This is only difficult to prove incorrect because it's a summary of the author's opinion.
  - perching_aix 3 hours ago
    
    [flagged]
    
    sgarland 3 hours ago
    
    But definitions can and are proven false. I hate it, mind you, but I can’t ignore it. For example, the usage of “literally” as an intensifier, e.g. “I literally died of laughter.”
    
    perching_aix 3 hours ago
    
    Logical statements can be proven true/false. Definitions are not logical statements, they do not have truth values, therefore cannot be proven neither true, nor false. These are mathematical logic basics.
    
    drdeca 2 hours ago
    
    Yes. However, in some cases (though probably not the ones relevant here) a definition can be proven to be incoherent (or, to presuppose something false), which is vaguely similar to “being false”.
    
    thaumasiotes 2 hours ago
    
    It would be difficult for a definition to make any presuppositions. You could have a definition that defines some set in which a contradiction is involved ("an integer is special if it is both prime and divisible by 4"), but then you'd say that the set so defined is empty, not that the definition is incoherent.
    
    Eisenstein 3 hours ago
    
    But that is their whole point -- as much as you want to make the definition something else, you can't. And this is a perfect example of that.
    
    thaumasiotes 2 hours ago
    
    > You may notice that opinions are like assholes: everyone has theirs.
    Maybe. There's a known condition in pigs that prevents them from forming one.
    
    AdieuToLogic 2 hours ago
    
    There are three possible explanations for the published policy I can identify. If there are others, please feel free to share them.
    1 - Publicity stunt In an effort to get more attention for the Gentoo project, the maintainers created an outlandish policy to drive traffic. This would seem unlikely due to the policy decision being voted upon over a year ago. 2 - Fear of LLM's replacing Gentoo maintainers This appears to not be the case based on the Gentoo minutes[0] provided: Policy on AI contributions and tooling ====================================== Motion from the email thread: > It is expressly forbidden to contribute to Gentoo any content that has > been created with the assistance of Natural Language Processing > artificial intelligence tools. This motion can be revisited, should > a case been made over such a tool that does not pose copyright, ethical > and quality concerns. The vote was 6y/0n/1a (all present members voted yes). sam noted as obiter dicta that the mail also mentioned: > This explicitly covers all GPTs, including ChatGPT and Copilot, which is > the category causing the most concern at the moment. At the same time, > it doesn't block more specific uses of machine learning to problem > solving. Several council members noted that we will revisit the policy if and when circumstances change and that it isn't intended to permanent, at least not in its current form. 3 - Experience with LLM-based change requests If the policy is neither a publicity stunt nor fear of LLM's replacing maintainers, then the simplest explanation remaining which substantiates the policy is maintainers having experience with LLM use and then publishing their decisions therein.
    0 - https://projects.gentoo.org/council/meeting-logs/20240414-su...
    
    perching_aix 2 hours ago
    
    Was this meant in response to what I wrote or did you mean to post this elsewhere in the thread? If the former, I'm not sure what am I supposed to do with this.
    
    AdieuToLogic 2 hours ago
    
    > Was this meant in response to what I wrote or did you mean to post this elsewhere in the thread? If the former, I'm not sure what am I supposed to do with this.
    You wrote:
    You may notice that opinions are like assholes: everyone has theirs. They're literally just "thoughts and feelings". They may masquerade as arguments from time to time, much to my dismay, but rest assured: there's nothing to "refute", debate, or even dispute on them. Not in general, nor in this specific case either.
    I provided analysis supporting my position that the project maintainers most likely did not make this policy based on "literally just 'thoughts and feelings'" and, instead, made an informed policy based on experience and rational discourse.
    I am not a Gentoo maintainer so cannot definitively state possibility #3 is what happened. Maybe one or both of the other two possibilities is what transpired. I doubt it, but if you have evidence refuting possibility #3, please share so we may all learn.
    
    perching_aix an hour ago
    
    An informed opinion is still an opinion. Voting itself is an expression of opinion, which they participated in - if it merely followed logically, it wouldn't have needed to be voted upon. Mind you, the "experience and rational discourse" is not presented, not in the policy, not in the excerpts and link you just provided.
    In order to "refute" their entire position, if we accept that to even make sense (I do not), I'd need to either prove them wrong about what their opinions are (nonsense), or show evidence they were actually holding a different opinion that ran contrary to what they shared (impossible, their actual opinion is known only to them, if that). There's very little "logical payload" to their published policy, if any. It's a series of opinions, and then a conclusion. Hence my example with the person not liking a given TV show, but stating their distaste as a fact of the world.
    > I doubt it, but if you have evidence refuting possibility #3, please share so we may all learn.
    Why am I being rhetorically coerced into engaging with something from a false set of options of your imagination, exactly?
    
    thaumasiotes an hour ago
    
    > I provided analysis supporting my position that the project maintainers most likely did not make this policy based on "literally just 'thoughts and feelings'" and, instead, made an informed policy based on experience and rational discourse.
    That position would look better if they hadn't relied so heavily on feelings to justify the announcement:
    >> Their operations are causing concerns about the huge use of energy and water.
    >> The advertising and use of AI models has caused a significant harm to employees [which ones?] and reduction of service quality.
    >> LLMs have been empowering all kinds of spam and scam efforts.
    There is no experience or rational discourse involved there.
    
    ants_everywhere 2 hours ago
    
    You're missing a very important reason
    4 - There is a very active anti-LLM activist movement and they care more about participating in it than they care about free software.
    For example, see their rationale, which are just canned anti-LLM activist talking points. You see the same ones repeated and memed ad nauseam if you lurk on anti-AI spaces.
    
    AdieuToLogic 2 hours ago
    
    > You're missing a very important reason
    > 4 - There is a very active anti-LLM activist movement ...
    All I can say to this is that my position is Large Language Models (LLM's) are a combination of algorithms and data.
    As as such, for me they do not qualify as anything to be either "pro" or "anti", let alone a participant of an activist movement.
    
    perching_aix an hour ago
    
    They were not talking about LLMs being participants of anything, but people who are against LLMs in whatever capacity. Surely people can be participants of a movement.
    
    AdieuToLogic 44 minutes ago
    
    >> All I can say to this is that my position is Large Language Models (LLM's) are a combination of algorithms and data.
    >> As as such, for me they do not qualify as anything to be either "pro" or "anti", let alone a participant of an activist movement.
    > They were not talking about LLMs being participants of anything ...
    Clearly I was referencing LLM's being something to foment "an activist movement" in an attempt to de-escalate the implication of there being some kind of "anti-LLM activist movement."
    > ... but people who are against LLMs in whatever capacity. Surely people can be participants of a movement.
    At this point your replies to my posts appear to be intentionally adversarial.
    
    perching_aix 40 minutes ago
    
    > Clearly I was referencing LLM's being something to foment "an activist movement" in an attempt to de-escalate the implication of there being some kind of "anti-LLM activist movement."
    Well, no, that really wasn't clear to me at all. I don't think it was clear in general either.
    > At this point your replies to my posts appear to be intentionally adversarial.
    Not my actual intention, apologies, although I 100% understand if at this point that is not at all believable.

DrNosferatu 2 minutes ago

Resistance is futile

puilp0502 an hour ago

Every time I encounter these kinds of policy, I can't help but wonder how these policies would be enforced: The people who are considerate enough to abide by these policies, are the ones who would have "cared" about the code qualities and stuff like that, so the policy is a moot point for these kinds of people. OTOH, the people who recklessly spam "contributions" generated from LLMs, by their very nature, would not respect these policies in very high likelihood. For me it's like telling bullies to don't bully.

By the way, I'm in no way against these kinds of policy: I've seen what happened to curl, and I think it's fully in their rights to outright ban any usage of LLMs. I'm just concerned about the enforceability of these policies.

joecool1029 21 minutes ago

> I can't help but wonder how these policies would be enforced
One of the parties that decided on Gentoo's policy effectively said the same thing. If I get what you're really asking... the reality is, there's no way for them to know if a LLM tool was used internally, it's honor system. But I mean enforcement is just ban the contributor if they become a problem. They've banned or otherwise restricted other ones for being disruptive or spamming low quality contributions in the past.
It's worded the way it is because most of the parties understand this isn't going away and might get revisited eventually. At least one of them hardline opposes LLM contributions in any form and probably won't change their mind.
userbinator 15 minutes ago

I think it's a discouragement more than an enforcement --- a "we will know if you submit AI-generated code, so don't bother trying." Maybe those who do know how to use LLMs really well can submit code that they fully understand and can explain the reasoning of, in which case the point is moot.
h4ny 36 minutes ago

You just stop accepting contributions from them?
There is nothing inherently different about these policies that make them more or less difficult to enforce than other kinds of polices.

dizlexic 2 hours ago

This might get me in trouble, but with all the negativity I’m seeing here I’ve got to ask.

Why do you care? Their sandbox their rules, and if you care because you want to contribute you’re still free to do so. Unless you’re an LLM I guess, but the rest of us should have no problem.

The negativity just seems overblown. More power to them, and if this was a bad call they’ll revisit it.

attentive 38 minutes ago

> and if this was a bad call they’ll revisit it.
how would they know? - this is (one of) the ways for people to let them know
h4ny 25 minutes ago

Not speaking for everyone but to me the problem is the normalization of bad behavior.
Some people in this thread are already interpreting that policies that allow contributions of AI-generated code means it's OK to not understand the code they write and can offload that work to the reviewers.
If you have ever had to review code that an author doesn't understand or written code that you don't understand for others to review, you should know how bad it is even without an LLM.
> Why do you care? Their sandbox their rules...
* What if it's a piece of software or dependency that I use and support? That affects me.
* What if I have to work with these people in these community? That affects me.
* What if I happen to have to mentor new software engineers who were conditioned to think that bad practices are OK? That affects me.
Things are usually less sandboxed than you think.

hjdjeiejd 3 hours ago

This is on-brand.

There was a time that I used Gentoo, and may again one day, but for the past N years, I’ve not had time to compile everything from source, and compiling from source is a false sense of security, since you still don’t know what’s been compromised (it could be the compiler, etc.), and few have the time or expertise to adequately review all of the code.

It can be a waste of energy and time to compile everything from source for standard hardware.

But, when I’m retired, maybe I’ll use it again just for the heck of it. And I’m glad that Gentoo exists.

atrettel 3 hours ago

At least when I used Gentoo, the point of compiling from source was more about customization than security. I remember having to set so many different options. It was quite granular. Now I just compile certain things from scratch and modify them as needed rather than having an entire system like Gentoo do that, but I do see the appeal to some people.
- bombcar 3 hours ago
  
  This is exactly why I use it where I use it - on my servers. I don’t need to compile X or X support for programs that could have it, because they’re headless.
- mikepurvis 3 hours ago
  
  Nix is another route as far as a compile-from-source package manager with lots of options on many packages.
  - Cyph0n 2 hours ago
    
    I feel like most Gentoo folks probably moved over to Nix/NixOS.
    The security argument for recompiling from source is addressed by the input addressed (sic) package cache. The customization aspect is mostly covered by Nix package overrides and overlays. You can also setup your own package cache.
sgarland 3 hours ago

Granted, I wasn’t into Arch at the time, but in the mid-aughts, Gentoo’s forums were a massively useful resource for Linux knowledge in general. That’s why I used it, anyway. The joy of getting an obscure sound card (Chaintech AV-710) to work in Linux, and sharing that knowledge with others, was enough.
jimmaswell 2 hours ago

I use it on some systems so strong that most emerges hardly take much longer than a binary package install. It's pretty nice there.

perching_aix 4 hours ago

Dated 2024-04-14 and features nothing special.

tptacek 4 hours ago

Interestingly --- while I doubt it would make a difference to the decision Gentoo in particular would make --- the cost/benefit of LLMs for coding changed sharply just a month or two after this, when the first iteration of foundation models tuned for effective agents came out. People forget that effective coding agents are just a couple minutes old; the first research preview release of Claude Code was this past February.
- malfist 3 hours ago
  
  > the cost/benefit of LLMs for coding changed sharply just a month or two after this
  People say this every month.
  - tptacek 2 hours ago
    
    Do they? I'm referring to something specific. While I happen to think LLM coding agents are pretty great, my point didn't depend on you thinking that, only on a recognition of the fact that the capabilities of these systems sharply changed very shortly after they published this --- in a very specific, noticeable way.
  - sothatsit an hour ago
    
    Marketing people say this every month, but that doesn't mean there haven't also been actual step-changes in AI-assisted coding in the last year.
    The policy is dated to 2024-04-14. After they approved this, there were all of these releases that were all pretty dramatic advancements for coding: 3.5 Sonnet (for taste + agentic coding), o1-preview (for reasoning), Claude Code (for developer experience), o3 (for debugging), Claude 4 Opus (for reliability), and now GPT-5 Pro (for code review).
    We have advanced from AI that can unreliably help you look up documentation for tools like matplotlib, to AI tools that can write and review large complicated programs, in the last year alone. Sure, these tools still have a lot of deficiencies. But that doesn't negate the fact that the change in AI for coding in the last year has been dramatic.
- blibble 3 hours ago
  
  > the cost/benefit of LLMs for coding changed sharply just a month or two after thi
  no, "AI" was dogshit a year ago when post was written, "AI" is dogshit today, and "AI" will still be dogshit in a year's time
  and if it was worth using (which it isn't), there's still the other two points: ethics and copyright
  and don't tell me to "shove this concern up your ass."
  (quoted verbatim from Ptacek's magnum opus: https://fly.io/blog/youre-all-nuts/)
  - jatora 3 hours ago
    
    [flagged]
    
    tptacek 3 hours ago
    
    We're not supposed to write comments like this.
notherhack 3 hours ago

Important point. A lot has changed in coding AIs since then.

ericdotlee an hour ago

Humans are important - but I have to wonder how any of this will be enforced?

ares623 4 hours ago

Maybe we’ll see a (new) distro with AI assisted maintainers. That would be an interesting experiment.

Unfortunately one caveat would be it will be difficult to separate the maintainers from the financial incentives, so it won’t be a fair comparison. (e.g. the labs funding full time maintainers with salaries and donations that other distros can only dream of)

simianwords 3 hours ago

> Ethical concerns. The business side of AI boom is creating serious ethical concerns. Among them: Commercial AI projects are frequently indulging in blatant copyright violations to train their models. Their operations are causing concerns about the huge use of energy and water. The advertising and use of AI models has caused a significant harm to employees and reduction of service quality. LLMs have been empowering all kinds of spam and scam efforts.

Highly disingenuous. First, AI being trained on copyrighted data is considered fair use because it transforms the underlying data rather than distribute it as is. Though I have to agree that this is the relatively strongest ethical claim to stop using AI but stands weak if looked at on the whole.

The fact that they mentioned "energy and water use" should tell you that they are really looking for reasons to disparage AI. AI doesn't use any more water or energy than any other tool. An hour of Netflix uses same energy as more than 100 GPT questions. A single 10 hour flight (per person*) emits as much as around 100k GPT prompts. It is strange that one would repeat the same nonsense about AI without primary motive being ideological.

"The advertising and use of AI models has caused a significant harm to employees and reduction of service quality." this is just a shoddy opinion at this point.

To be clear - I understand why they might ban AI for code submissions. It reduces the barrier significantly and increases the noise. But the reasoning is motivated from a wrong place.

themafia 2 hours ago

> AI being trained on copyrighted data is considered fair use because it transforms the underlying data rather than distribute it as is.
It's not a binary. Sometimes it fully reproduces works in violation of copyright and other times it modifies it just enough to avoid claims against it's output. Using AI and just _assuming_ it would never lead you to a copyright violation is foolish.
> uses same energy as more than 100 GPT questions.
Are you including training costs or just query costs?
> But the reasoning is motivated from a wrong place.
That does not matter. What matters is if the outcome is improved in the way they predict. This is actually measurable.
- simianwords 2 hours ago
  
  >That does not matter. What matters is if the outcome is improved in the way they predict. This is actually measurable.
  Ok lets discuss facts.
  >It's not a binary. Sometimes it fully reproduces works in violation of copyright and other times it modifies it just enough to avoid claims against it's output. Using AI and just _assuming_ it would never lead you to a copyright violation is foolish.
  In the Anthropic case the Judge ruled that AI training is transformative. It is not binary as you said but I'm criticising what appears as binary from the original policy. When the court ruling itself has shown that it is not violation of copyright, it is reasonable to criticise it now although I acknowledge the post was written before the ruling.
  >Are you including training costs or just query costs?
  The training costs are very very small because they are amortised over all the queries. I think training accounts around .001% to .1% of each query depending on how many training runs are done over a year.
  - twelvechairs an hour ago
    
    On copyright its worth noting that Gentoo has a substantial user base outside the USA (maybe primarily - see [0]) for whom the anthropic judgment you mention probably doesn't mean much
    [0] https://trends.builtwith.com/Server/Gentoo-Linux
    
    simianwords an hour ago
    
    Fair point but I would think EU would be all up on this. This is right up their alley and clearly an easy way to justify more regulations and slow down AI. Why hasn’t anything come out of it?
ses1984 2 hours ago

The idea that models are transformative is debatable. Works with copyright are the thing that imbues the model with value. If that statement isn’t true, then they can just exclude those works and nothing is lost, right?
Also, half the problem isn’t distribution, it’s how those works were acquired. Even if you suppose models 44are transformative, you can’t just download stuff from piratebay. Buy copies, scan them, rip them, etc.
It’s super not cool that billion dollar vc companies can just do that.
- simianwords 2 hours ago
  
  > In Monday's order, Senior U.S. District Judge William Alsup supported Anthropic's argument, stating the company's use of books by the plaintiffs to train their AI model was acceptable.
  "The training use was a fair use," he wrote. "The use of the books at issue to train Claude and its precursors was exceedingly transformative."
  I agree it is debatable but it is not so cut and clear that it is _not_ transformative when a judge has ruled that it is.
- perching_aix 2 hours ago
  
  > The idea that models are transformative is debatable. Works with copyright are the thing that imbues the model with value. If that statement isn’t true, then they can just exclude those works and nothing is lost, right?
  I don't follow.
  For one, all works have a copyright status I believe (under US jurisdiction; this of course differs per jurisdiction, although there are international IP laws), some are just extremely permissive. Models rely on a wide range of works, some with permissive, some with restrictive licensing. I'd imagine Wikipedia and StackOverflow are pretty important resources for these models for example, and both are licensed under CC BY-SA 4.0, a permissive license.
  Second, despite your claim being thus false, dropping restrictively copyrighted works would make a dent of course I'm pretty sure, although how much, I'm not sure. I don't see why this would be a surprise: restrictively licensed works do contribute value, but not all of the value. So their removal would take away some of the value, but not all of it. It's not binary.
  And finally, I'm not sure these aspects solely or even primarily determine whether these models are legally transformative. But then I'm also not a lawyer, and the law is a moving target, so what do I know. I'd imagine it's less legal transformativeness and more colloquial transformativeness you're concerned about anyhow, but then these are not necessarily the best aspects to interrogate either.
CursedSilicon 2 hours ago

That's quite a strawman definition of "copyright infringement" especially given the ongoing Anthropic lawsuit
It's not a question of if feeding all the worlds books into a blender and eating the resulting slurry paste is copyright infringement. It's that they stole the books in the first place by getting them from piracy websites
If they'd purchased every book ever written, scanned them in and fed that into the model? That would be perfectly legal
- steveklabnik 2 hours ago
  
  That’s what happened; the initial piracy was an issue, but those models were never released, and the models that were released were trained on copyrighted works they purchased.
  - boristsr an hour ago
    
    That's not true, or they wouldn't have settled for 1.5bln specifically for training on pirated material.
    https://apnews.com/article/anthropic-copyright-authors-settl...
infamia an hour ago

> Highly disingenuous. First, AI being trained on copyrighted data is considered fair use because it transforms the underlying data rather than distribute it as is.
Your legal argument aside, they downloaded torrents and trained their AI on them. You can't get much more blatant than that.
- simianwords an hour ago
  
  Yes but that was one company and it is not core to their infra or product. So I don’t know how one can characterize AI fundamentally to be unethical because one company pirated some books
shmerl 2 hours ago

I don't get this idea. Transformative works don't automatically equal fair use - copyright covers all kind of transformative works.

logicprog 4 hours ago

There are reasonable ethical concerns one may have with AI (around data center impacts on communities, and the labor used to SFT and RLHF them), but these aren't:

> Commercial AI projects are frequently indulging in blatant copyright violations to train their models.

I thought we (FOSS) were anti copyright?

> Their operations are causing concerns about the huge use of energy and water.

This is massively overblown. If they'd specifically said that their concerns were around the concentrated impact of energy and water usage on specific communities, fine, but then you'd have to have ethical concerns about a lot of other tech including video streaming; but the overall energy and water usage of AI contributed to by the actual individual use of AI to, for instance, generate a PR, is completely negligible on the scale of tech products.

> The advertising and use of AI models has caused a significant harm to employees and reduction of service quality.

Is this talking about automation? You know what else automated employees and can often reduce service quality? Software.

> LLMs have been empowering all kinds of spam and scam efforts.

So did email.

Veedrac 2 hours ago

I get why water use is the sort of nonsense that spreads around mainstream social media, but it baffles me how a whole council of nerds would pass a vote on a policy that includes that line.
- simianwords 2 hours ago
  
  Because it is ideologically motivated.
CursedSilicon 2 hours ago

>I thought we (FOSS) were anti copyright?
FOSS still has to exist within the rules of the system the planet operates under. You can't just say "I downloaded that movie, but I'm a Linux user so I don't believe in copyright" and get away with it
>the overall energy and water usage of AI contributed to by the actual individual use of AI to, for instance, generate a PR, is completely negligible on the scale of tech products.
[citation needed]
>Is this talking about automation? You know what else automated employees and can often reduce service quality? Software.
Disingenuous strawman. Tech CEO's and the like have been exuberant at the idea that "AI" will replace human labor. The entire end-goal of companies like OpenAI is to create a "super-intelligence" that will then generate a return. By definition the AI would be performing labor (services) for capital, outcompeting humans to do so. Unless OpenAI wants it to just hack every bank account on Earth and transfer it all to them instead? Or something equally farcical
>So did email.
"We should improve society somewhat"
"Ah, but you participate in society! Curious!"
AdieuToLogic 3 hours ago

>> Commercial AI projects are frequently indulging in blatant copyright violations to train their models.
> I thought we (FOSS) were anti copyright?
No free and open source software (FOSS) distribution model is "anti-copyright." Quite to the contrary, FOSS licenses are well defined[0] and either address copyright directly or rely on copyright being retained by the original author.
0 - https://opensource.org/licenses
- bombcar 3 hours ago
  
  Some of the ideas behind the GPL could be anti-copyright, insofar as the concept they’d love to see is software being uncopyrightable.
bleepblap 4 hours ago

>> Commercial AI projects are frequently indulging in blatant copyright violations to train their models. > I thought we (FOSS) were anti copyright?
Absolutely not! Every major FOSS license has copyright as its enforcement method -- "if you don't do X (share code with customers, etc depending on license) you lose the right to copy the code"

hsbauauvhabzb 3 hours ago

> Their operations are causing concerns about the huge use of energy and water.

I’d be curious how much energy gentoo consumes versus a binary distro.

mervn 4 minutes ago

[dead]

mmaunder 3 hours ago

Posted April 2024. I wonder how they feel about this now. Or will next year. Claude Code wouldn’t exist for another year when this was posted. Nevermind Codex. It’s already awkward. Within 12 months it will be cringeworthy.

danpalmer 3 hours ago

This is a prime example of poor AI policy. It doesn't define what AI is – is using Google translate in order to engage on their mailing lists allowed? Is using Intellisense-like tools that we've had for decades allowed? The rationale is also poor, citing concerns that can be applied far more widely than just LLMs. The ethical concerns are pretty hand-wavy, I'm pretty sure email is used to empower spam and yet I suspect Gentoo have no problem using email.

The end result is not necessarily a bad one, and I think reasonable for a project like Gentoo to go for, but the policy could be stated in a much better way.

For example: thou shalt only contribute code that is unencumbered by copyright issues, contributions must be of a high quality and repeated attempts to submit poor quality contributions may result in new contributions not being reviewed/accepted. As for the ethical concerns, they could just take a position by buying infrastructure from companies that align with their ethics, or not accepting corporate donations (time or money) from companies that they disagree with.

Spivak 3 hours ago

Or because this is a policy by and for human adults who all understand what we're talking about you just don't accept contributions from anyone obviously rule-lawyering in bad faith.
This isn't a court system, anyone intentionally trying to test the boundaries probably isn't someone you want to bother with in the first place.
- danpalmer an hour ago
  
  This policy being so specific in what it bans means that you can't enforce it easily against people who are close but technically within the letter of the policy, and you create a grey area and friction for those who are meeting the spirit of the policy in good faith, but technically in violation.
  I have friends and colleagues who I trust as good engineers who take different positions on this (letter vs spirit) and I think there are good faith contributions negatively impacted by both sides of this.
dmead 3 hours ago

> It doesn't define what AI is
this is a bad faith comment.
- danpalmer an hour ago
  
  Honestly, I tried to make this in good faith. The examples I gave were perhaps extreme, but my point is that AI is a moving target. Today it means specifically generative AI done by large models – usually not classification, recommendations, and usually not "small" models, all of which have been normalised. LLMs are becoming normalised, and policy needs to be able to keep up to the shifting technological landscape.
  Defining policy on the outcomes, rather than the inputs, makes it more resilient and ultimately more effective. Defining policy on the inputs is easy to dismantle.
- malfist 3 hours ago
  
  The whole argument smacks of bad faith "yet you participate in society" arguments.