The issue of training data for OpenAI’s models like ChatGPT has sparked significant controversy, particularly among publishers. While OpenAI keeps the specifics of its training data secret, it is known that similar language models consume social media posts, blogs, online reviews, and digitized books, among other internet content. This includes articles from online news and media sites, leading to allegations of copyright infringement. Publications like the New York Times have sued OpenAI, claiming their content fuels ChatGPT's knowledge of historical and current events. OpenAI defends its actions, citing "fair use" of publicly available materials.
Publishers are divided on how to respond. Some block OpenAI from using their content, while others, like Vox Media, have struck licensing deals. These partnerships offer publishers some control over how their content appears in AI-generated responses and potential benefits like access to reader data and innovative products. However, critics argue these deals undermine publishers' intellectual property rights and credibility.
The legality of OpenAI's data usage is still under scrutiny, but for now, the media landscape is split between those who sue and those who seek to profit from the rise of generative AI.
Source: Mashable
Comentarios