AI decreases human-generated content, limiting data for training AI
Training an LLM on LLM-generated content is like making a photocopy of a photocopy, providing successively less satisfying results
The use of ChatGPT has led to a decrease in human-generated content with people asking and answering fewer questions online, finds new research from Corvinus University of Budapest.
Content and discussions online are used by people to learn new things and solve problems, and essential for training AI, particularly Large Language Models (LLMs) like ChatGPT.
Johannes Wachs, Associate Professor at Corvinus University, and colleagues from UCL and LMU Munich, investigated the impact of ChatGPT on the generation of open data on Stack Overflow, an online Q&A platform for computer programmers and an essential source of training data for LLMs.
The researchers found that, after the introduction of ChatGPT, there was a sharp decrease in human content creation: ChatGPT users are less likely to post questions and answers on the platform or visit the platform regularly.
As people use ChatGPT more instead of online knowledge databases or platforms which allow discussion, displacing the human behaviour which generates the data it is trained on, the quality and quantity of data available for training future AI decreases.
“The decreased production of open data will limit the training of future models. LLM-generated content itself is likely an ineffective substitute for training data generated by humans for the purpose of training new models. Training an LLM on LLM-generated content is like making a photocopy of a photocopy, providing successively less satisfying results,” says Professor Wachs.
The researchers explain that we should prioritise encouraging people to exchange information and knowledge online with each other, and not only rely on AI and LLMs.
These findings were first published in the journal PNAS Nexus.
/ENDS
For more information, a copy of the research paper, or to speak with Professor Johannes Wachs, please contact Kyle Grizzell from BlueSky Education on +44 (0) 1582 790709 or kyle@bluesky-pr.com
This press release was distributed by ResponseSource Press Release Wire on behalf of BlueSky Education in the following categories: Business & Finance, Manufacturing, Engineering & Energy, Computing & Telecoms, for more information visit https://pressreleasewire.responsesource.com/about.