Published on Mar 12, 2026 [Permalink]
Reading time: 11 minutes
Posted in: post

[Draft] AI misuse enforcement has a blind spot

AI providers ban accounts, not users. That distinction matters more than it sounds. When a provider detects misuse, whether it is an influence operation, an attempt to extract dangerous information, or large-scale model distillation, the standard response is to restrict the offending account. But accounts are cheap. A new email address, an anonymous SIM card, and a VPN are enough to start over. The Stanford HAI AI Index Report 2025 recorded a 56% increase in AI-related incidents in 2024, with reports of malicious actors using AI rising eightfold since 2022. Between February 2024 and October 2025, OpenAI alone disrupted over 40 malicious networks. Google’s Threat Intelligence Group identified APT groups from more than 20 countries abusing Gemini. Anthropic documented what it described as the first AI-orchestrated cyber espionage campaign, a Chinese state-sponsored operation targeting approximately 30 entities with human intervention limited to 20 minutes per phase while the model operated for hours. These instances of misuse constitute an enforcement failure, not a detection failure.

Most consumer AI products require only an email address for access. There is no regulatory mandate for AI-specific KYC: the EU AI Act mandates risk management but not customer identity verification, and California’s SB 53 focuses on safety frameworks and transparency. Even the most ambitious step to date, OpenAI’s Verified Organisation program requiring government ID and a 3D facial scan, applies only to developers seeking API access to advanced models.

Three specific gaps

Current enforcement has three critical gaps.

Creating new accounts. Users can create new accounts with the same provider after being restricted. OpenAI’s June 2025 report disclosed that “High Five,” a covert influence operation, repeatedly attempted to return after its initial ban. The Iran-linked STORM-2035 operation, disrupted in October 2024, reappeared eight months later. The Russia-origin “Stop News” operation had returned by October 2025 after a previous ban, evolving to generate video prompts intended for other AI models. North Korean threat actors were observed re-registering with different email addresses after termination.
Moving to other providers. Disrupting a threat actor’s access at one provider does not stop the underlying activity. A China-linked group, “Phish and Scripts,” used ChatGPT while simultaneously researching automation via DeepSeek. An influence operation disrupted by Anthropic was linked to the same “A2Z” operation that OpenAI had previously disrupted. Anthropic’s August 2025 report identified a Telegram bot providing access to multiple AI models marketed for romance scams, with Claude specifically advertised as a “high EQ model.”
Distributing misuse across providers. A user can split a malicious task into subtasks allocated to different AI providers. The PoisonSwarm framework (2025) demonstrated this by routing subcomponents of a malicious request to different LLM services. Without seeing the end-to-end interaction, individual AI providers are more likely to mark interactions as benign. This attack is almost exclusively an API-level concern and rests primarily on a research demonstration, but it is increasingly plausible as single-provider defences improve.

The structural case for identity-based enforcement

The most developed precedent for identity-based enforcement is the financial sector’s Know-Your-Customer regime. Banks spend an estimated $180.9 billion per year globally on KYC compliance, yet transaction monitoring systems generate 90 to 95% false positive rates. However, these failures are concentrated in continuous transaction surveillance (where financial institutions must infer intent from transaction metadata), while the underlying identity verification layers, the Customer Identification Program and Customer Due Diligence tiers codified in the FinCEN Customer Due Diligence Rule, remain highly effective. The interventions I propose borrow from these identity verification layers while substituting AI-native mechanisms for the transaction monitoring that drives the financial sector’s failures.

Examining other industries reveals five structural conditions that determine whether identity-based enforcement succeeds.

Centralised chokepoints: Control over gateways where requests pass.
Behavioural visibility: Insight into the substance of what users are actually doing.
Low-cost digital verification: The ability to verify identity affordably at scale.
Repeated interactions: Sustained engagement that builds behavioural profiles.
Proportional escalation: Calibrating verification requirements to the risk level.

AI providers possess all five conditions. They control centralised API gateways, see the full content of every interaction (where financial institutions see only transaction metadata, AI providers can observe stated intent directly), can leverage modern automated verification costing $0.80 to $2.00 per check, observe iterative user engagement, and can calibrate verification to capability levels. A gateway can evaluate and block a request in milliseconds, compared to the median delay of 166 days for financial institutions to file Suspicious Activity Reports. By contrast, systems lacking these conditions, like mandatory SIM card registration (where the GSMA found no empirical evidence of crime reduction) or South Korea’s real-name internet verification (where malicious comments decreased by only 0.9% before the system was struck down), consistently fail.

Evidence from other domains

Domains with similar structural features show that verification works. Google Play blocked 1.75 million policy-violating apps in 2025, a success Google attributes to verification barriers making the cost-benefit calculation unfavourable for most threat actors. Apple’s identity verification and human review result in less than 1% of mobile malware appearing on iOS versus over 95% on Android. Chemical precursor controls achieved an over 93% reduction in domestic meth lab seizures through tiered access requirements, though production shifted to industrial-scale operations in Mexico, illustrating the displacement dynamic that also applies to open-weight AI models. The UK’s online gambling verification regime found low rates of illegal underage gambling, with failures occurring through misuse of adult accounts rather than verification failures.

Digital platforms have reached the same conclusion, though execution matters. Discord’s February 2026 global age verification rollout triggered immediate backlash due to privacy concerns and its association with a vendor involved in a prior data breach that exposed roughly 70,000 users’ government ID photos. Discord ultimately delayed the rollout and pivoted to mandating that any vendor offering facial age estimation must perform it entirely on-device, with no biometric data leaving the user’s phone. The lessons apply directly to AI providers: communicate transparently before launch, recognise that prior breaches destroy trust, vet vendors with enforceable on-device requirements, and offer multiple verification pathways. Motivated by recent age verification legislation in various jurisdictions, the tech industry is now moving toward credentials like the OpenAge Initiative’s AgeKey, a FIDO passkey-based credential that verifies age thresholds without tracking individuals across apps. Meta is integrating AgeKey into Facebook, Instagram, and WhatsApp, with plans to roll it out across multiple countries in 2027.

What AI providers can do

I propose three categories of interventions.

1. Adopt a Tiered Architecture

Providers must implement conditional KYC escalation. Some already gate capabilities to identity: OpenAI has deployed AI-based age prediction that infers whether users are minors from usage patterns, triggering additional safeguards. Anthropic’s Claude Code Security tool is restricted to Enterprise and Team customers and vetted open-source maintainers. These decisions are currently made ad hoc. A proportional mapping could systematise them:

Tier 0: Baseline access via email for standard consumer chat. Behavioural monitoring operates passively; anomalous patterns may trigger escalation.
Tier 1: Elevated access via phone or payment verification for higher rate limits or more capable models.
Tier 2: Advanced access via government ID or digital identity credential for frontier model APIs or fine-tuning. This is where OpenAI’s Verified Organisation program currently operates.
Tier 3: Institutional access involving organisational vetting for enterprise-scale API access or capabilities with significant dual-use potential.

Two categories of triggers should govern escalation: behavioural triggers (repeated tripping of misuse classifiers, usage patterns suggesting possible violations) and capability-based factors (the system’s evaluated capabilities, type of access such as fine-tuning or frontier model APIs). A significant category of concerning behaviour sits just below the ban threshold, including patterns like repeated probing of safety boundaries or systematic testing for exploitable weaknesses. These patterns do not warrant bans, but they do warrant additional scrutiny through proportional measures like triggering a verification step-up.

The tiered approach is not merely a design preference; it is the only economically viable option. Verifying OpenAI’s roughly 800 million weekly active users at $1.50 each would cost approximately $1.2 billion. If only 5 to 10% of users trigger escalation to document verification, total annual costs for a mid-sized provider fall to the low single-digit millions.

2. Leverage Existing Digital ID Infrastructure

Providers can integrate existing government digital ID systems like the EU Digital Identity Wallet (mandatory across all 27 member states by December 2026) or India’s Aadhaar (processing 2.31 billion monthly authentication transactions). Third-party verification services and reusable passkey-based credentials can fill the gaps, making repeated biometric checks and ID uploads obsolete across platforms. Mobile platform integration from Google and Apple is further expanding the set of users who can verify with minimal friction.

3. Coordinate Across Providers

Without inter-provider coordination, restrictions do not follow users across services. Three existing coordination regimes demonstrate that competitor cooperation on user safety is operationally feasible. FinCEN’s Section 314(b) program permits voluntary information sharing between over 7,000 financial institutions with statutory liability protection. The UK’s JMLIT, a public-private partnership between major banks and the National Crime Agency, has delivered over 10,700 accounts identified, £56 million seized, and 210 arrests. GIFCT’s hash-sharing database operates across 30+ platforms achieving 90 to 96% removal rates for flagged terrorist content. An important limitation: GIFCT shares perceptual hashes of images and videos, stable digital fingerprints where the same content produces the same hash. AI misuse flags would need to encode behavioural patterns and intent inferences that are inherently platform-specific and classifier-dependent, requiring significant technical design work on signal standardisation.

However, voluntary coordination faces a fundamental incentive problem: providers with lighter verification attract users fleeing stricter competitors. A systematic review of 190 studies found corporate self-regulation effective only half the time. An AAAI 2025 study on White House voluntary AI commitments found mean compliance scores of only 69% for the first cohort and 44.6% for the second. Cross-provider information sharing requires explicit legal scaffolding, such as the National Cooperative Research and Production Act (NCRPA) or Defense Production Act § 708, to mitigate antitrust risks. A narrow window exists: the DOJ and FTC launched a joint public inquiry in February 2026 for new antitrust collaboration guidelines.

A further challenge: usage policies differ across providers. If providers share binary misuse flags without resolving this, a user banned under one provider’s interpretation could be flagged across the ecosystem for conduct that is fully compliant elsewhere. A federated architecture with independent oversight, perhaps modelled on ICANN’s UDRP for domain dispute resolution or the EU Digital Services Act’s trusted flagger system (which requires demonstrated expertise, independence from platforms, and accuracy), is essential for accountability and resolving definitional disputes over what constitutes misuse.

Understanding tradeoffs

These interventions incur real costs that deserve direct attention.

Classifier limitations. The BELLS framework (2025) found that at the 0.1% false positive threshold needed for production, most safety tools achieve only 5 to 6% true positive rates. A February 2026 study in Nature Communications found a 97% jailbreak success rate when using large reasoning models as autonomous adversaries. This means classifier-triggered escalation will under-flag the highest-risk users who possess the sophistication to evade detection and over-flag legitimate users who happen to trigger false positives. Classifier performance also degrades for non-English content, and one study found 61.3% of non-native English essays incorrectly labelled as AI-generated. The system must therefore layer multiple independent signals, including account metadata anomalies, usage pattern analysis, and cross-provider coordination, rather than relying on classifiers alone.

Exclusion and false positives. Over 800 million people worldwide lack official proof of identity, with over 90% living in low-income countries. AI adoption in the Global South is growing rapidly, with India accounting for 9.3% of global ChatGPT traffic and adoption growth rates in the lowest-income countries running at four times those in the highest-income countries. In a cross-provider KYC regime, a single erroneous flag could propagate across all major AI services. The financial sector’s de-banking phenomenon illustrates the risk: global correspondent banking relationships declined 22% from 2011 to 2019, with catastrophic regional impacts. FATF formally recognised in its 2025 Guidance that financial exclusion is itself a risk to financial integrity. Content moderation coordination has shown parallel problems: research found that tweets in African American Vernacular English are up to twice as likely to be flagged as offensive.

Displacement to open-weight models. Imposing KYC burdens on closed APIs risks segmenting the market. Open-weight models currently trail frontier closed models in capability for the most dangerous misuse categories, but this gap is narrowing. Hugging Face surpassed two million hosted models by late 2025, researchers identified 8,608 uncensored model repositories with 43.1 million total downloads, and stripping safety protections from a 70-billion-parameter model costs under $200. The proposals in this report should therefore be understood as addressing a window of opportunity defined by the current capability gap, not as a permanent solution.

The deterrence asymmetry. The documented misuse cases motivating these proposals are dominated by state-affiliated actors and organised criminal networks, the actors least likely to be deterred by identity requirements. The system’s measurable deterrent effect will be concentrated on unsophisticated actors. I believe this still justifies the costs for three reasons: casual misuse generates real harms at scale (the influence-as-a-service operations managing hundreds of bots, the romance scam networks, the iterative probing for exploit code); identity-based enforcement raises costs even for sophisticated actors by forcing them to acquire and cycle fraudulent credentials; and the layered approach means the system does not depend on any single mechanism succeeding against every threat category. But these proposals will not solve the problem posed by state-level actors with dedicated operational infrastructure.

Tradeoffs should be carefully addressed when designing interventions. The experience of safety researchers and red-teamers, whose work by definition looks identical to misuse at the classifier level, deserves particular attention; pre-registered researcher programmes with verified credentials and exemption from automated flagging can prevent the coordination system from undermining the independent safety research on which the broader ecosystem depends. Provider-level enforcement can complement other approaches, including broader AI resilience measures and compute governance, but is not a panacea. The building blocks for digital identity verification and more responsible AI misuse enforcement now exist, and the cost of inaction is too high.