My last post captured the attention of my old fellow Sandro 'strk' Santilli on Mastodon, who sent a provocation about the whole AIAD thing. So, the challenge is accepted.
First and foremost, the whole AIAD issue is a complex and hotly debated topic. The question of whether the training practices of the past and present should be considered fair use is a matter of contention. This is particularly true for existing code bases on well-known repositories and other types of content, a complexity that keeps us all intellectually engaged.
In the specific case of code, the basis for deciding about such a question was formerly stated in the accompanying licenses. Honestly, there is nothing in the current formulation of OSI licenses (BSD and GPL, among others) that precludes such a very special activity. The non-discriminating conditions apply to any human activity, and the neural net training - undoubtedly a human-driven task - is not excluded. The training is a learning type of activity or a sort of.
That said, the NN training is, of course, a massive and intensive kind of learning. But let me consider any artifact created by an AI model as a direct derivative product (this is a forcing of the concept for me, but let me axiomatically accept it). That could be an 'original' product created from scratch (i.e., by direct prompting or by limited documental context) or a direct derivate one (because it is based on previous code proposed as part of the AI context). Suppose you ask any of the big models for a direct clarification of the terms of use. In that case, you will discover that any of their artifacts retain the copyright ownership of the product, but there is a very permissive use that perfectly adheres to the four fundamental freedoms of FOSS licenses. Of course, the results could be based on a previous code base. In that case, its license still applies for such a derivative work, even with the additional copyright of Anthropic, Google, or OpenAI.
This could pose a significant problem if the final software product needs to retain one specific copyright holder, as is the case with most proprietary software or some FOSS ones. Understanding the potential impact of copyright transfer is a key consideration in this context, and it's crucial for us to be fully informed and aware.
Is the process of an AI participating in development really that different from what any average hacker does when participating in FOSS projects? I don't think so. You add your copyright to the existing ones for the parts that are under your direct control and accept conditions of use already defined in a license. The true challenge, in my humble opinion, lies in the changing of licenses. Any ex-post change should start with a note of acceptance by all copyright holders, including Anthropic, Google, etc.
I consider this difficulty a feature, not a bug. I advise against participating in projects with copyright transferring because of the potential for changing the license later when your contributions move out of your control. It happened in the past, and it will happen again.
One should also consider the licenses of many other sources of knowledge and inspiration for developers. How many people know that the default license for StackOverflow code snippets is CC BY-SA? Indeed, how many developers actually add an acknowledgment in their software for such snippets? Or even for snippets taken from sites, blogs, books, or manuals without considering that such sources are even more restrictive for use and creating derivative work? Isn't our full daily work the result of a long learning phase, based on our education and training by books, experiences of others, as well as trial-and-error processes?
That said, let me spend some words about the elephant in the room. In the context of AI and copyright law, the 'elephant' represents the complex nature of AI models and their potential to create original works. Do AI models dream of electric sheep? Well, I don't think current models are pure stochastic parrots, to be honest. I think there are probably dozens or hundreds of cognitive forms that govern what we call generically intelligence, including some emotive and empathic forms that one can also find in a dog, a cat, or a dolphin. One of those forms is probably captured from the neural network model of functional representation, which is also perhaps in common with part of our mind. We are at an average level, much more effective and efficient in those regards, and to be fair, I would also say that hallucinating is a common experience even on the human side. We are much more complete and retain contexts that are wide, like a lifetime. A few people are terrified by this observation and seek refuge in negation or faithful certainties.
We are complex organisms with probably still partially known processes that govern our so-called intelligence, which is physically based on cells and energy in our brains, whether we like it or not. We found a way to mimic part of this complex process, with all limitations of the case. Is this intelligence? I don't know, but at the end of the day, what is intelligence? When none asks me about that, I know perfectly what is, but if you ask me about that, I don't know anymore.