Could Meta Use Threads Conversations to Train Its AI Chatbots?

Mark Zuckerberg may not have to dethrone Twitter to profit handsomely from his new app.

Is Zuckerberg taking on this risky gambit just to spite Musk? Doubtful. There may be something else in the Threads equation for Meta, even if it doesn’t overtake Twitter: AI training data. Meta is developing large language models just like everyone else, and LLMs must be trained on sizable chunks of organic text on a wide range of topics. That’s why OpenAI trained the models underpinning ChatGPT on Twitter conversations—back when gathering training data was as easy as scraping the public web.

Now, tech companies are beginning to lock down their data (as Stack Overflow and Twitter have already done), which prevents other companies from accessing that source material for training language models. There’s a real possibility, then, that future LLMs are characterized by the proprietary training data available to them. Google’s models may gain an understanding of the world by consuming mountains of YouTube videos. Elon Musk may use Twitter content to train the ChatGPT rival he’s hinted at building. And Meta may one day find a rich fund of training data in Threads.

arketyp.

Mark Zuckerberg may not have to dethrone Twitter to profit handsomely from his new app.