,

OpenAI’s Reddit Data Training Practices Challenged

## Tech CEO Challenges OpenAI on Reddit Data Use Prior to Partnership

**REDDIT, CA – November 27, 2023** – Months before its widely reported data licensing agreement with Reddit, artificial intelligence giant OpenAI faced a direct and public challenge from tech entrepreneur Matt Schlicht concerning its historical data training practices. Schlicht, CEO of AI indexing service Open Claw, pressed OpenAI to clarify whether its models had been trained on publicly available Reddit data *prior* to November 2023, offering to buy coffee for OpenAI’s entire 5,000-plus employee team if the company could publicly and verifiably deny it.

The provocative wager was issued by Schlicht on November 27, 2023, via a post on the r/openclaw subreddit. He directly questioned OpenAI’s data sourcing, asserting his belief that the company was indeed utilizing public Reddit content to train its AI models. “We’re challenging OpenAI for a definitive answer,” Schlicht wrote in his post, highlighting what he perceived as vague or contradictory public statements from OpenAI on the matter.

For Schlicht, whose company Open Claw specializes in making Reddit conversations searchable and accessible, the question is far from academic. He argued that if OpenAI was leveraging public Reddit data without formal agreements, it directly impacts the competitive landscape for services like Open Claw and raises significant transparency concerns across the broader AI industry. “Clarity on data sourcing is crucial for developers and users alike to understand the capabilities and limitations of AI models,” Schlicht emphasized.

This challenge predated the major announcement in February 2024, when OpenAI and Reddit confirmed a partnership to allow OpenAI access to Reddit’s Data API for content training purposes. Schlicht’s focus was squarely on OpenAI’s practices *before* any such formal agreements were in place, tapping into a broader industry debate about the ethics and legality of scraping publicly available data for AI model development.

At the time of Schlicht’s challenge, OpenAI, like many AI companies, often stated it trained its models on a “diverse corpus of publicly available data” without specifying individual sources. This general language left considerable room for speculation, particularly concerning popular platforms like Reddit, which has billions of user-generated posts. Reddit itself had, by early 2023, begun implementing stricter API access policies and charging for data access, signaling its intent to control how its vast content archive was used by third parties, especially large AI models.

While OpenAI did not publicly issue the specific, verifiable denial that Schlicht requested—thus presumably not triggering the 5,000-coffee payout—the absence of such a statement only intensified the debate. The eventual partnership with Reddit in February 2024, which grants OpenAI licensed access to Reddit’s content, effectively legitimizes future training on Reddit data. However, it sidesteps the critical historical question Schlicht originally posed about *prior* unauthorized data use.

Schlicht, also recognized for his contributions to TechCrunch, Chatbot Magazine, and his stewardship of the GPT-3 subreddit, has consistently advocated for greater transparency in the AI sector. His direct appeal to OpenAI underscored growing calls from developers, content creators, and businesses for clearer guidelines on data provenance in the rapidly evolving AI landscape. The outcome of his challenge, while not resulting in a direct OpenAI concession, served to highlight the pressure on AI companies to be more explicit about the foundations of their powerful models.

Media

Senior Editor
Share this article:

Comments

No comments yet. Leave a reply to start a conversation.

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to Space

By signing up, you agree to receive our newsletters and promotional content and accept our Terms of Use and Privacy Policy. You may unsubscribe at any time.

Categories

Recommended