LLM Scrapers Allegedly Target Multiple Open Source Projects Disrupting the FOSS Ecosystem

March 17, 2025

In mid-March 2025, KDE's GitLab infrastructure faced disruption from allegedly aggressive AI web scrapers originating from Alibaba IP ranges. These bots reportedly ignored robots.txt and spoofed browser headers, causing site overloads and outages for developers. Similar incidents were reported in other FOSS projects like GNOME, SourceHut, and Fedora. The scraping is allegedly tied to large language model training and imposes real costs and delays.

This incident highlights the importance of implementing guardrails for AI, such as those promoted by Project Cerebellum's AI governance efforts. By mapping incidents like this one to HISPI Project Cerebellum TAIM (Govern function), we can better understand, measure, and manage these types of threats to safe and secure AI practices. JOIN US.

Alleged deployer

unnamed-generative-ai-companies, alibaba

Alleged developer

unnamed-generative-ai-companies, alibaba

Alleged harmed parties

sysadmins, sourcehut, read-the-docs, linux-weekly-news, kde, inkscape, gnome, foss-projects-and-communities, fedora, diaspora, curl

Data source

Incident data is from the AI Incident Database (AIID).

When citing the database as a whole, please use:

McGregor, S. (2021) Preventing Repeated Real World AI Failures by Cataloging Incidents: The AI Incident Database. In Proceedings of the Thirty-Third Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-21). Virtual Conference.

Pre-print on arXiv · Database snapshots & citation guide

We use weekly snapshots of the AIID for stable reference. For the official suggested citation of a specific incident, use the “Cite this incident” link on each incident page.

LLM Scrapers Allegedly Target Multiple Open Source Projects Disrupting the FOSS Ecosystem

Matched TAIM controls