Published on Mar 17, 2024 [Permalink]
Reading time: 13 minutes
Posted in: Governance

[Draft] Preventing gaming of AI evaluations: a case study of the Volkswagen diesel emissions scandal

This is a first draft of a post, written by Vinay Hiremath with initial research done along with Rebecca Hawkins and David Varga as part of a one-day AI governance research sprint.

In this post, the Volkswagen diesel emissions scandal uncovered in 2014 is used as a case study for effective enforcement of AI evaluations governance. A discussion of the incentives, motivations, and divergent outcomes between two different regulatory frameworks in the diesel emissions scandal is used to underline the importance of robust governance of AI evaluations and better inform the design of such governance.

Background on emissions regulation

A crucial factor in the Volkswagen diesel emissions scandal was the gap between vehicle emissions regulations in the US and the EU, defining the incentive structures at the time of the fraudulent actions. Here, a brief history and overview of these two environments is provided.

In the wake of smog concerns raised through the 1950s and 1960s in regions such as Southern California, the state of California passed landmark legislation targeting vehicle emissions in the Motor Vehicle Pollution Control Act of 1959, which outlined standards that were later adopted federally in 1968, in turn leading to the creation of the California Air Resources Board (CARB) and Environmental Protection Agency (EPA) in 1967 and 1970, respectively.¹ This regulation resulted in rapid improvements in vehicle emissions, including by banning lead additives in gasoline to improve the adoption of catalytic converters (otherwise damaged by lead additives), which in turn converted carbon monoxide (CO), nitrogen oxide (NOx), and hydrocarbons in vehicle emissions into less harmful substances.² Over the next few decades, improving catalytic converter technology and growing state and federal regulations further reduced emissions of various pollutants.¹ This was followed by setting minimum average fuel economy ratings for vehicles during the 1970s oil crisis through the corporate average fuel economy (CAFE) system in 1975, with periodically increasing averages thereafter. Most notably, US regulations started with vehicle emissions before adding fuel economy standards in response to unstable oil prices. With little diesel passenger vehicle dependence in the US, the EPA was able to clamp down on pollutant emissions, a traditional drawback of diesel engines. EPA Tier 2 restrictions in 2007 limited NOx for diesel engines to 0.07 grams per mile, by far the toughest standard in the world, undercutting even Euro 6 standards which were to come into effect in 2014.³

Meanwhile, European countries largely responded to the 1970s oil crisis by increasing taxes to reduce fuel consumption, with many taxing diesel at a lower rate than gasoline due to its higher fuel economy.⁴ Consequently, the share of diesel among new passenger cars in Western Europe rose from around 15% in 1990 to 51.6% in 2015, compared to around 0% in 1990 and 3% in 2015 in the US.¹ Moreover, vehicle emission regulations were adopted much later in Europe, with catalytic converters required only since the start of the 1990s and leaded fuel banned regionally by 2000 (both around a decade later than the US). While Euro 1 standards for emissions were adopted in 1992, current Euro 6 diesel standards (limiting NOx to 10% of 1992 levels) remain far less stringent than those in the US. In summary, European regulations largely stemmed from oil crisis-era fuel economy concerns that incentivised diesel passenger vehicles, with vehicle emissions regulated afterwards while attempting to avoid upsetting this status quo.

Volkswagen’s diesels

At Volkswagen, Martin Winterkorn was appointed as CEO in 2007 and announced a plan to make Volkswagen the most profitable and environmentally-friendly car company by 2018.⁵ With new US EPA Tier 2 diesel emissions limits seen as particularly strenuous for Volkswagen given its diesel dependence in the European market and interest in growing its US presence, there was great interest in "solving" the clean diesel problem to comply with US regulations. However, reducing diesel emissions had been known to require expensive and heavy equipment re-engineered into vehicles and/or large urea tanks that needed regular refilling. Volkswagen had claimed to avoid these limitations with its clean diesel technology, with its diesel sales increasing more than 150% in the US between 2009 and 2013, and as Winterkorn had planned VW surpassed Toyota to become the largest automaker by global sales in mid-2015.

With strong internal motivation to expand VW diesel market share in the US and far tougher US EPA Tier 2 standards since 2007, there was clear motivation to comply or find a way to bypass compliance in the US market at least until Euro 6 standards (nearly as stringent as EPA Tier 2 diesel standards) came into effect in 2014. Given the well-defined nature of lab emissions testing, with specific steering, acceleration, and other parameters, it was conceived that defeat code could be used for compliant engine behaviour in testing and more flexible operation in the real world (improving fuel economy with far higher emissions). Implementing such a defeat device was a decision that both engineers working on the project as well as senior leadership, including CEO Winterkorn, knew about. In one meeting with engineers and leadership, while discussing the development of such as a device, at least one member warned of damage to VW’s reputation if their work was discovered, but these concerns were ignored by upper management.

Evading regulations

There remained the question of how VW had managed to engineer cleaner diesel engines with only a small urea tank rather than the bulky equipment other manufacturers used, and the International Council on Clean Transportation (ICCT) chose to look into VW’s diesels due to possible discrepancies in European real-world emissions data.⁶ The ICCT commissioned researchers at the Center for Alternative Fuels, Engines, and Emissions (CAFEE) at Western Virginia University who (using rented diesel VWs due to budget constraints) measured emissions with portable equipment rather than the more common lab setting typically used for regulatory testing. Finding significant anomalies in real-world emissions and finally concluding that the vehicles had markedly different emissions behaviours in lab environments, the researchers published their work at a public conference in 2014 which garnered the interest of CARB and the EPA. Teams of experts then spent around a year further validating the theory of a "defeat device" in the engine control unit (ECU) code, with VW diesel vehicles exceeding the NOx emissions limits by 5-35 times while a comparable BMW model passed the tests.

After mounting EPA and CARB investigations into VW’s emissions anomalies following the disclosure by CAFEE researchers, VW internally assessed its liability if admitting fraud at around $20 billion (USD).⁵ The automaker chose to deny fault and mislead investigators, publishing an ineffective software fix while destroying evidence such as documents and mobile phones. Following the confession of an employee in 2015, VW finally admitted to fraud under the threat of its 2016 vehicles not being certified for sale in the US, later pleading guilty to fraud, obstruction of justice, and other charges alongside various EPA claims, state AG lawsuits, and payouts to consumers.

Outcomes

Particularly relevant for designing robust and effective systems for enforcing regulations are the divergent outcomes in the US as opposed to Germany and the EU. With an estimated 45,000 DALYs lost as a result of the fraudulent emissions by vehicles sold between 2009 to 2015, the public health consequences are notable and mounting due to delayed recalls in some markets.⁷

In the US, with 500,000 fraudulently compliant vehicles sold, more robust legal processes including class action lawsuits to more effectively combine claims from thousands of individual owners, criminal liability for corporations, and a stronger legal framework around violation of air quality laws resulted in a buyback program for US owners estimated to cost $20 billion (USD), $2.8 billion in criminal penalties, and $4.7 billion for clean air projects and consumer education.⁸

Meanwhile, in the EU, the absence of class action or other methods of similar interest lawsuits resulted in 50,000 lawsuits by individual owners as of February 2018, with more diverse rules of sales law and tort law resulting in diverse decisions that VW has opted to await rather than admitting fraud with clear buyer compensation and penalties as in the US.⁸ Issues include a two-year limitation period for certain sales law claims in Germany, lack of clarity on the effectiveness of software fixes, the limited scope of German tort law for violations of "absolute" rights rather than economic losses, no statute barring criminal conspiracies, and no law against lying to regulators or investigators. These complexities limited direct liability to fines of up to €25,000 (EUR) and mandatory software upgrades despite around 8.5 million affected vehicles sold in the EU, far more than in the US.

Furthermore, due in part to widespread noncompliance and significant consumer dependence on diesel passenger vehicles, European regulators responded to the scandal by weakening testing for several years afterwards including allowing NOx 110% above the legal limit until 2020 and 50% afterwards.¹ In contrast, the EPA and other US agencies increased future testing and extended the certification process besides mandating spending on clean air projects.

Relevance to AI evals governance

There are several analogies by which the background, motivations, and outcomes of the Volkswagen diesel emissions scandal may inform more robust governance around AI model evaluations to prevent gaming. Governance of AI model evaluations is defined here to include policies that require certain evals work to be done on AI models, which may specify internal/external audits, testing for specific dangerous behaviours, etc.

To justify the comparison with the VW diesel scandal, it is perhaps most illustrative to view the different emissions behaviour during testing and in real-world use as analogous to varying behaviour of AI models when undergoing evaluations in testing as opposed to "in the wild," which is expected to pose a concern in the form of goal mis-generalisation or even strategically deceptive behaviour.⁹¹⁰ In both scenarios, there are motivations to comply with testing until the model is deployed, and once deployed there is a motivation to fail to comply in favour of other objectives, so the regulatory approaches to mitigate these failures with diesel emissions may clarify governance approaches to prevent gaming of AI model evals.

Somewhat notably, the VW diesel scandal was unravelled by a small university group with relatively low-cost testing methods and no specific motive to uncover fraud. A similar low barrier to entry is currently the case with many AI evals research projects which only require API access to a model running in the cloud, and it is worth considering the value of ensuring this continues to grow the body of qualified researchers and therefore increase the likelihood of discovering evals gaming, while remaining mindful to limit potentially harmful access.¹¹

Additionally, the use of portable emissions equipment that VW did not plan for, in contrast to lab treadmill testing, avoided the defeat device testing mode and exposed real-world emissions data. This can be seen as analogous to running AI evals in unconventional testing environments, a research and engineering problem that can be designed into regulation to ensure evaluations are not limited to easily detectable test settings. There are already concerns about misreporting AI model capabilities, data center compute capabilities, and other metrics, reminiscent of the Volkswagen emissions fraud and perhaps prevented by similar third-party certification and auditing measures.¹²

Another important lesson from the VW diesel emissions scandal was the conflict of interest in EU diesel regulation due to the large consumer dependence on diesel passenger vehicles. This encouraged more lax regulations of diesel emissions as well as responses by regulators focused on avoiding market instabilities rather than enforcing emissions limits. This illustrates the importance of tracking and possibly limiting the integration of AI models in economic sectors or regions relevant to regulators, to protect the independence of AI evals regulations. Furthermore, the divergent emissions limits between the US and EU markets likely reduced the incentive to re-engineer a costly solution in line with harsher US regulations, resembling potential race-to-the-bottom scenarios in the absence of cooperative international governance of AI development¹³

Furthermore, the role of management incentives and other process-based factors that allowed for the VW diesel scandal may be relevant to improving governance of AI model creators to ensure more robust AI evaluations free from gaming. The flexibility that VW management had in orchestrating an engineering solution to fraudulently comply with emissions regulations can be limited by process-based approaches. Just as the EPA is typically only able to test about 15% of powertrains due to limited resources, AI evals labs remain limited in number and scope with relatively little funding compared to corporations that train and publish AI models. One approach may resemble elements of PCI compliance, which was developed by banks issuing credit/debit cards in the wake of rampant fraud in the late 1990s with the rise of online shopping.¹⁴ Unifying their policies into the Payment Card Industry Data Security Standard (PCI DSS) in 2004, issuers required payment processors to be certified by independent external assessors on metrics such as internal security training and best practices, providing internal test cases to the external auditor, maintaining an inventory of software and documentation, mandating separation between development/production staff, and requiring segregation of functions indispensable to the organisation and subject to abuse, which were naturally viewed as more susceptible to cheating.¹⁵ This framework, spearheaded by card issuers with a motivation to decrease their fraud liability, improved security for payment processors globally by defining process-based and outcomes-based standards for software development, and it is likely that something analogous would mitigate AI evals gaming risks due to the inadequacy of current outcomes-based testing for detecting dangerous model behaviours.¹⁶

Conclusion

Through introducing the background, motivations, processes, and regulatory environment-dependent outcomes of the Volkswagen diesel emissions scandal, several analogies emerge that may better inform governance around AI evals and gaming of these regulations to target this problem on various levels such as incentive and liability structures, internal processes, and outcomes-based testing. It is hoped that this work leads to future research including into the effectiveness of various other regulatory frameworks, methods for more robust process auditing, and increasing cooperation between regulators to avoid race-to-the-bottom scenarios with incentives to defraud individual regulators.

Klier, T., & Linn, J. (2016). The VW Scandal and Evolving Emissions Regulations. Chicago Fed Letter. https://econpapers.repec.org/article/fipfedhle/00045.htm ↩︎
Gasoline and the environment - leaded gasoline - U.S. Energy Information Administration (EIA). (n.d.). Retrieved March 17, 2024, from https://www.eia.gov/energyexplained/gasoline/gasoline-and-the-environment-leaded-gasoline.php ↩︎
Robinson, A. (2015). Caught Black-Handed: Why Did Volkswagen Cheat? In Car and Driver. https://www.caranddriver.com/news/a15351476/caught-black-handed-why-did-volkswagen-cheat/ ↩︎
Ewing, J. (2016). VW Scandal Clouds Prospects for Other Diesel Makers at Geneva Motor Show. The New York Times. https://www.nytimes.com/2016/03/04/automobiles/wheels/vw-scandal-clouds-prospects-for-other-diesel-makers-at-geneva-motor-show.html ↩︎
SEC.gov \ Volkswagen Aktiengesellschaft, et al. (n.d.). Retrieved March 17, 2024, from https://www.sec.gov/litigation/litreleases/lr-24422 ↩︎
Wendler, A. (2015). How VW Got Busted for Skirting EPA Diesel Emissions Standards. In Car and Driver. https://www.caranddriver.com/news/a15352518/how-volkswagen-got-busted-for-gaming-epa-diesel-emissions-standards/ ↩︎
Oldenkamp, R., Zelm, R. van, & Huijbregts, M. A. J. (2016). Valuing the human health damage caused by the fraud of Volkswagen. Environmental Pollution, 212, 121–127. https://doi.org/10.1016/j.envpol.2016.01.053 ↩︎
Eger, T., & Schäfer, H.-B. (2018). Reflections on the Volkswagen Emissions Scandal [{SSRN} {Scholarly} {Paper}]. https://doi.org/10.2139/ssrn.3109538 ↩︎
Hubinger, E., Denison, C., Mu, J., Lambert, M., Tong, M., MacDiarmid, M., Lanham, T., Ziegler, D. M., Maxwell, T., Cheng, N., Jermyn, A., Askell, A., Radhakrishnan, A., Anil, C., Duvenaud, D., Ganguli, D., Barez, F., Clark, J., Ndousse, K., … Perez, E. (2024). Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. arXiv. http://arxiv.org/abs/2401.05566 ↩︎
Langosco, L., Koch, J., Sharkey, L., Pfau, J., Orseau, L., & Krueger, D. (2023). Goal Misgeneralization in Deep Reinforcement Learning. arXiv. http://arxiv.org/abs/2105.14111 ↩︎
Shevlane, T. (2022). Structured access: An emerging paradigm for safe AI deployment. arXiv. http://arxiv.org/abs/2201.05159 ↩︎
Pilz, K. (n.d.). An assessment of data center infrastructure’s role in AI governance. ↩︎
Trager, R., Harack, B., Reuel, A., Carnegie, A., Heim, L., Ho, L., Kreps, S., Lall, R., Larter, O., Ó hÉigeartaigh, S., Staffell, S., & Villalobos Ruiz, J. J. (2023). International Governance of Civilian AI: A Jurisdictional Certification Approach. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4579899 ↩︎
Worldpay Editorial Team. (n.d.). What’s the History of PCI DSS? - Insights \textbar{=latex} Worldpay from FIS. In FIS Global. Retrieved March 17, 2024, from https://www.worldpay.com/en/insights/article/pci-dss-history-everything-you-need-to-know ↩︎
Baykara, S. (2021). What is the Separation of Duties Principle and How Is It Implemented? In PCI DSS GUIDE. https://pcidssguide.com/what-is-the-separation-of-duties-principle-how-is-it-implemented/ ↩︎
Perez, E., Huang, S., Song, F., Cai, T., Ring, R., Aslanides, J., Glaese, A., McAleese, N., & Irving, G. (2022). Red Teaming Language Models with Language Models. arXiv. http://arxiv.org/abs/2202.03286 ↩︎