Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

February 28, 2025

An investigation uncovered 12,000 live API keys and authentication credentials within a dataset utilized for large language model (LLM) training. Preliminary findings suggest that some of these sensitive secrets remained active, potentially allowing malicious actors to gain unauthorized access. The discovery was made in the December 2024 Common Crawl archive, encompassing approximately 250 billion web pages. If exploited, the affected credentials could have enabled a wide range of harmful activities such as data breaches, service disruptions, financial fraud, and more. This underscores the importance of trustworthy AI governance and safe and secure AI practices.

Join us at Project Cerebellum to help establish guardrails for AI and ensure harm prevention through our HISPI Project Cerebellum TAIM (Govern) initiative. JOIN US

Alleged deployer

microsoft, openai, common-crawl, microsoft-azure-openai-service

Alleged developer

common-crawl, openai, microsoft

Alleged harmed parties

aws, slack, mailchimp, microsoft, google, intel, huawei, paypal, ibm, tencent

AI governance case studies

For forensic AI governance failure analysis (TAIMScore™ case studies), browse Human Signal’s Failure Files™.

Data source

Incident data is from the AI Incident Database (AIID).

When citing the database as a whole, please use:

McGregor, S. (2021) Preventing Repeated Real World AI Failures by Cataloging Incidents: The AI Incident Database. In Proceedings of the Thirty-Third Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-21). Virtual Conference.

Pre-print on arXiv · Database snapshots & citation guide

We use weekly snapshots of the AIID for stable reference. For the official suggested citation of a specific incident, use the “Cite this incident” link on each incident page.

Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

Matched TAIM controls

AI governance case studies