Most people are not aware or concerned that LLMs are created through what amount to illicit means, and that they may in fact contain and regurgitate copyrighted works.
If you ask GPT-4 to do a passage in the style of Carmen Machado or Margaret Atwood or Alexander Chee, it will do a fair job at it, and for good reason: It likely ingested all their works in the training process, and now uses their ingenuity for its own. But these authors, and thousands more, are not happy with this fact.
In an open letter signed by more more than 8,500 authors of fiction, non-fiction and poetry, the tech companies behind large language models like ChatGPT, Bard, LLaMa and more are taken to task for using their writing without permission or compensation.
“These technologies mimic and regurgitate our language, stories, style, and ideas. Millions of copyrighted books, articles, essays, and poetry provide the ‘food’ for AI systems, endless meals for which there has been no bill,” the letter reads.
Despite their systems proving capable of quoting and imitating the authors in question, AI developers have not substantially addressed the provenance of these works. Are they trained on samples scraped from bookstores and reviews? Did they borrow every book from the library? Or perhaps they simply downloaded one of the many illegal archives, like Libgen?
Read More at TechCrunch
Read the rest at TechCrunch