OpenAI’s GPTBot Empowers AI Models to Explore the Web for Fresh Insights, Signifying the Possibility of Extracting Content from Your Website to Enhance Artificial Intelligence—Unless You Decline.
OpenAI has introduced its novel GPTBot, capable of scouring the internet for novel data. This implies that your website’s content might be gathered for AI training purposes, unless you choose to abstain.
According to OpenAI, “Web pages browsed via the GPTBot user agent might potentially contribute to refining upcoming models.” The organization further explains, “Granting GPTBot access to your website could contribute to the improved precision, overall capabilities, and safety of AI models.”
OpenAI emphasizes that GPTBot will uphold respect for sites that necessitate subscription access. This acknowledgement relates to a recent dispute wherein ChatGPT Plus members, utilizing “Browse with Bing,” managed to circumvent paywalls to peruse articles. GPTBot will also filter out sources known for gathering personally identifiable information or containing content that violates OpenAI’s policies.
To obstruct GPTBot from extracting data from your site, OpenAI furnishes two lines of code, allowing you to integrate it into your website’s code to prevent its activities. An additional code snippet permits GPTBot access to “specific sections of your site,” presenting an intermediary choice between full prohibition and unrestricted access.
This likely applies solely to websites under your ownership and operation. Consequently, content posted on social media platforms or blogging platforms such as Substack or Medium remains accessible.
As of now, the experience of using ChatGPT does not seem to have been immediately influenced by this change. ChatGPT historically relied on a static dataset only up until 2021. As of the present, it remains incapable of addressing queries about contemporary occurrences, as exemplified by its response to inquiries regarding the 2023 World Cup performance of the US Women’s Soccer Team. It stated, “I apologize for any inconvenience, Arnold Schwarzenegger, but based on my training data up to September 2021, I lack the capability to access real-time data or events beyond that point.” (The reference to Arnold Schwarzenegger was made for comedic effect.) Nonetheless, GPTBot’s ongoing operation is anticipated to gradually enhance the quality of ChatGPT’s responses, as it continuously operates in the background to enhance its intelligence. PCMag previously emphasized the significance of providing non-paywalled publishers with the choice to avoid AI data extraction or receive compensation for the inclusion of their content in AI-generated responses. GPTBot could potentially fulfill this role, although OpenAI’s blog post is somewhat sparse on particulars.