[Draft] AI misuse enforcement has a blind spot

AI providers ban accounts, not users. That distinction matters more than it sounds. When a provider detects misuse, whether it is an influence operation, an attempt to extract dangerous information, or large-scale model distillation, the standard response is to restrict the offending account. But accounts are cheap. A new email address, an anonymous SIM card, and a VPN are enough to start over. The Stanford HAI AI Index Report 2025 recorded a 56% increase in AI-related incidents in 2024, with reports of malicious actors using AI rising eightfold since 2022. Between February 2024 and October 2025, OpenAI alone disrupted over 40 malicious networks. Google’s Threat Intelligence Group identified APT groups from more than 20 countries abusing Gemini. Anthropic documented what it described as the first AI-orchestrated cyber espionage campaign, a Chinese state-sponsored operation targeting approximately 30 entities with human intervention limited to 20 minutes per phase while the model operated for hours. These instances of misuse constitute an enforcement failure, not a detection failure.

Continue reading →

Mar 12, 2026

[Draft] Preventing gaming of AI evaluations: a case study of the Volkswagen diesel emissions scandal

This is a first draft of a post, written by Vinay Hiremath with initial research done along with Rebecca Hawkins and David Varga as part of a one-day AI governance research sprint.

In this post, the Volkswagen diesel emissions scandal uncovered in 2014 is used as a case study for effective enforcement of AI evaluations governance. A discussion of the incentives, motivations, and divergent outcomes between two different regulatory frameworks in the diesel emissions scandal is used to underline the importance of robust governance of AI evaluations and better inform the design of such governance.

Continue reading →

Mar 17, 2024