16 June 2026

EU AI Act · enforcement · AI audit · government · regulatory compliance · audit-washing

The Audit-Washing Problem: Real Compliance vs. Paperwork

Audit-washing isn't a new phenomenon. It's what happens when the compliance process becomes more important than the compliance outcome: companies produce the documentation, pass the audit, and operate exactly as they did before.

In financial services, audit-washing produced the conditions for the 2008 crisis. In environmental compliance, it produced the Volkswagen emissions scandal. In AI, it's going to produce a wave of EU AI Act certifications that bear no relationship to what those systems actually do.

For regulators and enforcement agencies, this is the problem that will define the next five years of AI governance.

What Audit-Washing Looks Like in AI

Under the EU AI Act, high-risk AI providers must produce technical documentation, implement risk management systems, ensure human oversight mechanisms, and in some cases obtain conformity assessment certification. For many categories, self-certification is permitted.

Self-certification sounds straightforward. In practice, it creates a gap that sophisticated actors will exploit.

Here's what audit-washing looks like in the AI context:

Documentation theater. A company produces the Article 11 technical documentation required by the Act, but the documentation describes an idealised version of the system rather than the system as it actually operates. Training data documentation lists sources and quality measures that don't reflect actual data practices. Testing procedures describe methodologies that weren't actually applied in production.

Retroactive compliance. A system is deployed, operated, and already causing real-world outcomes before compliance work begins. The compliance work is then done with the goal of producing documentation that makes the system look like it was always compliant, rather than identifying and fixing the actual gaps.

Scope games. A company classifies a system as minimal-risk or limited-risk when a reasonable reading of Annex III would place it as high-risk. The reclassification is done with sufficient legal argumentation to be defensible, even if it doesn't reflect the system's real-world impact.

Risk management systems that don't manage risk. A company creates a risk management system document, assigns ownership, and runs annual reviews. But the risk management process doesn't connect to product development decisions, doesn't route risk findings to anyone who can act on them, and produces findings that are acknowledged and filed rather than remediated.

Why Standard Conformity Assessment Doesn't Catch It

Conformity assessment under the EU AI Act is primarily a documentation review process. For most Annex III categories, a third-party notified body reviews the technical documentation and related artifacts to confirm they meet the regulation's requirements. The notified body isn't independently testing the system.

This creates a structural vulnerability. The assessment verifies that the documentation says the right things. It doesn't verify that the system does what the documentation says.

There are analogies in other regulated industries. ISO 27001 certification verifies that a company has documented its information security management system. It doesn't verify that the company isn't experiencing breaches. A SOC 2 Type II audit attests to the design and operating effectiveness of controls over a review period. It doesn't prevent a compromise the day after the audit closes.

AI is more consequential than many of these analogies because the gap between documented behaviour and actual behaviour can be larger and harder to detect. A model's behaviour in a controlled test environment can diverge substantially from its behaviour at scale, under distribution shift, or in adversarial conditions.

The Enforcement Problem

For national competent authorities tasked with EU AI Act enforcement, audit-washing creates a resource problem.

If every high-risk AI system has a conformity assessment certificate, the certificate stops being a meaningful signal. Enforcement then has to shift from certificate-checking to substantive system assessment. That requires technical capability that most regulatory agencies don't currently have.

Substantive AI system assessment involves:

Testing model behaviour under adversarial conditions and distribution shift
Independently verifying training data provenance claims
Stress-testing the human oversight mechanisms under realistic operational conditions
Cross-referencing the documented risk management process with actual product development history
Testing whether the monitoring and logging systems actually capture what the documentation says they capture

This is closer to a technical audit than a documentation review, and it requires people who can read model cards, understand training procedures, and design test cases that reveal real-world model behaviour.

Most national competent authorities are currently staffed for document review, not technical assessment. That capability gap is what makes audit-washing viable.

What Genuine Compliance Looks Like

Genuine compliance has observable characteristics that distinguish it from documentation theater.

Compliance work started before the product shipped. The most reliable signal that compliance is genuine rather than retroactive is that it began during product development. Risk management processes that run during development actually shape product decisions. Retroactive compliance can produce similar documentation, but it can't produce a change history showing that risk findings altered the product.

The documentation matches observable system behaviour. If a company's Article 11 documentation claims that the system has a 95% accuracy rate across demographic groups, that claim should be verifiable through independent testing. If it claims a certain human oversight mechanism triggers under specific conditions, that mechanism should be observable in a live environment.

The humans responsible for compliance have operational authority. In companies where compliance is genuine, the person responsible for the risk management system has the ability to delay product releases, require remediation, and escalate to leadership. In companies where compliance is performative, the compliance function is staffed and resourced, but has no practical authority over product decisions.

Incident history is documented and shows learning. Real AI systems encounter unexpected outputs, edge cases, and failure modes during operation. Companies with genuine compliance processes document these incidents, investigate them, and show evidence that the investigation changed something. Companies with audit-washing problems have clean incident logs or no logs at all.

Practical Steps for Enforcement Agencies

Given the resource constraints most agencies face, here's a practical prioritisation framework for substantive AI compliance assessment.

Prioritise by risk tier and incident signal. Start with Annex III high-risk systems in the sectors with the highest real-world impact: credit and lending, employment screening, biometric identification, and law enforcement. Within those, prioritise systems that have had observable incident reports, regulatory complaints, or adverse media coverage.

Develop adversarial testing protocols. The most efficient way to test whether a system's documentation is accurate is to test the system against the claims in the documentation. If documentation says the system achieves parity across demographic groups, test it. If it says monitoring triggers under specific conditions, trigger those conditions.

Build technical assessment capability internally. The long-term solution to the audit-washing problem is enforcement agencies that have technical staff capable of substantive AI assessment. That's a multi-year investment, but it's the only way to shift the incentive structure for AI providers.

Create safe reporting channels for internal whistleblowers. Some of the most effective signals about audit-washing will come from people inside the companies in question who know that the documentation doesn't reflect reality. The EU AI Act's whistleblower protection provisions (Article 87) create a framework for this; enforcement agencies should actively communicate these protections to potential whistleblowers.

Build enforcement precedent early. The first enforcement actions under the EU AI Act will set the standard for what substantive compliance looks like. Early cases that involve genuine technical assessment, rather than only documentation review, will create deterrence that document-only enforcement can't.

The Regulator's Genuine Advantages

The audit-washing problem is real, but regulators have genuine advantages that sophisticated AI providers should not underestimate.

Regulators have time. A company that passes a documentation review today is not protected from retrospective enforcement if their documentation is later found to have been misleading. The EU AI Act's enforcement timeline extends well beyond the initial compliance dates.

Regulators have access to operational data. Once enforcement is triggered, regulators have access to the internal documents, communications, and development history that distinguish genuine compliance from retroactive documentation. The gap between what a system was documented to do and what the development history shows it was actually designed to do is not easy to paper over under adversarial scrutiny.

Regulators have comparative data. As enforcement agencies build experience with substantive AI assessment, they'll develop pattern recognition for what genuine compliance looks like versus what audit-washing looks like. That pattern recognition gets better with each case.

The audit-washing problem is a transition challenge, not a permanent condition. The companies that are genuinely building compliant systems have nothing to fear from it getting harder to pass documentation reviews.