The EU AI Act's Auditability Requirements: What Engineering Teams Must Build by 2026

Disclaimer: This article is an engineering interpretation of regulatory requirements. The EU AI Act is subject to ongoing interpretation by the EU AI Office and national competent authorities. Implementing acts, delegated acts, and harmonised standards referenced in the Act are still being finalized. Readers should consult qualified legal counsel for compliance advice specific to their organization, use cases, and jurisdictional exposure.

The EU AI Act entered into force on August 1, 2024. Its provisions for high-risk AI systems -- the category that captures most enterprise AI applications with real-world consequences -- become enforceable on August 2, 2026. That is not a policy date. It is an engineering deadline.

Between now and August 2026, engineering teams building or operating high-risk AI systems that touch EU markets must build specific technical capabilities into their systems. Not documentation about those capabilities. Not roadmap commitments to implement them later. The actual artifacts, stores, processes, and interfaces that the Act requires.

This article breaks down Articles 9 through 15 of the Act -- the seven articles that define what high-risk AI systems must contain and produce -- and maps each to the engineering work required. The treatment is vendor-neutral. The goal is to give engineering and compliance teams a shared reference for what must be built, not how to build it with a specific tool.

Article 9: Risk Management System

Article 9 requires that a high-risk AI system operate within a risk management system that is established, documented, implemented, and maintained throughout the system's lifecycle. This is not a one-time risk assessment. It is a living process that must be updated when the system changes, when new risks are identified, or when operating conditions shift.

What Engineering Must Build

A risk register with version history. The risk register must capture identified risks, their assessed likelihood and severity, the mitigations applied, and the residual risk after mitigation. Each entry must be timestamped and attributable (who identified the risk, who approved the mitigation). The register must be version-controlled so that the risk posture at any historical point can be reconstructed.

Risk-triggered testing procedures. Article 9(6) requires that testing be appropriate to the intended purpose of the system and that it specifically address the risks identified in the risk management process. This means your testing infrastructure must be linked to your risk register: when a new risk is identified, a corresponding test case must be created. When a risk mitigation is implemented, the test must verify the mitigation's effectiveness.

Residual risk documentation with acceptance criteria. Article 9(4) requires that when risks cannot be fully eliminated, residual risks must be communicated to the deployer and judged acceptable given the system's intended purpose. This requires an explicit residual risk acceptance process -- documented, approved, and versioned -- that is included in the system's instructions for use.

The NIST AI RMF provides a complementary structure through its Govern, Map, Measure, and Manage functions. Organizations already implementing NIST AI RMF will find significant overlap with Article 9's requirements, but should verify that their NIST implementation includes the lifecycle maintenance and deployer communication elements that Article 9 specifically mandates.

Article 10: Data and Data Governance

Article 10 establishes requirements for the data used to train, validate, and test high-risk AI systems. Its scope extends beyond training data to include validation sets, test sets, and -- critically -- the ongoing data that the system processes in production.

What Engineering Must Build

Training data documentation. Article 10(2) requires documentation of data collection processes, data preparation operations (annotation, labeling, cleaning, enrichment), data provenance, and the purpose for which data was collected. For teams using third-party foundation models, this requirement creates a documentation chain: the model provider's training data documentation must be assessed for completeness, and any fine-tuning or retrieval-augmented generation data must be independently documented.

Bias examination records. Article 10(2)(f) requires examination of data for possible biases that may affect health and safety or lead to discrimination. This is not a general fairness statement. It requires documented analysis of the specific data sets used, the specific biases examined, the methodology applied, and the findings. The analysis must be updated when data sources change.

Data governance procedures. Article 10(2) mandates governance procedures covering data collection, relevance assessment, availability assessment, and quality assessment. These procedures must be documented and followed, not merely stated. For engineering teams, this typically means implementing data validation pipelines with documented quality thresholds, automated quality checks, and rejection logging for data that fails quality criteria.

Article 11: Technical Documentation

Article 11 requires that technical documentation be drawn up before the high-risk AI system is placed on the market or put into service, and that it be kept up to date. Annex IV of the Act specifies the required contents in detail. This is the article that most directly translates into a substantial engineering documentation effort.

What Engineering Must Build

The documentation required by Annex IV includes:

General system description: Intended purpose, provider identity, system version, hardware and software prerequisites, and the system's interaction with other systems.
Detailed description of system elements: Development processes, design specifications, system architecture, computational resources, and the specific AI techniques used (model architecture, training methodology, inference pipeline).
Data documentation: The Article 10 data governance records, including training data characteristics, data quality measures, and data preparation procedures.
Monitoring, operation, and control: Performance metrics, accuracy specifications, known limitations, and the human oversight measures implemented.
Validation and testing: Test methodologies, test data sets, test results, and the metrics used to assess performance across relevant subgroups.
Change management: Description of systems and processes for managing changes to the system after initial deployment, including the predetermined change control plan if applicable.

MIT Technology Review's analysis of AI governance has noted that Annex IV's documentation requirements are among the most operationally demanding provisions of the Act, particularly for teams that have not been maintaining living technical documentation as part of their development process. Retrofitting this documentation for an existing system is significantly more expensive than maintaining it incrementally alongside development.

Article 12: Record-Keeping (Automatic Logging)

Article 12 is the article that most directly requires engineering infrastructure rather than documentation. It mandates that high-risk AI systems include automatic logging capabilities that record events relevant to identifying situations that may result in the system presenting a risk, that facilitate post-market monitoring, and that enable the tracing of the system's operation throughout its lifecycle.

What Engineering Must Build

Structured event logging with defined schema. The logs must capture, at minimum: the period of use of the system, the reference database against which input data has been checked, the input data for which the search has led to a match, and the identification of the natural persons involved in the verification of results (where applicable). For decision-making systems, this translates to per-decision structured records capturing inputs, logic applied, outputs, and the human reviewers involved.

Immutability and retention. Article 12(3) specifies that logging capabilities must enable a level of traceability appropriate to the system's intended purpose. In practice, this requires append-only storage with retention periods aligned to the system's lifecycle and regulatory requirements. Logs that can be modified, overwritten, or deleted outside a controlled retention policy do not satisfy the traceability requirement.

Accessibility for deployers. Article 12(1) requires that the logs be accessible to deployers to the extent the logs are under their control. This means the logging infrastructure must include access mechanisms -- APIs, export capabilities, or dashboards -- that allow deployers to retrieve and review logs for the systems they operate.

# Minimum log schema for Article 12 compliance
decision_log:
  log_id: "uuid-v4"                    # Unique log entry identifier
  timestamp: "ISO-8601"                # When the event occurred
  system_version: "semver"             # System version at time of event
  event_type: "decision | alert | override | error"
  input_summary:
    data_sources: ["list of sources consulted"]
    input_hash: "sha256"               # Integrity verification
  logic_applied:
    rule_set_version: "semver"         # Version of decision rules
    rules_evaluated: ["rule_ids"]      # Which rules were evaluated
    model_version: "identifier"        # If model contributed to decision
  output:
    decision: "outcome"
    confidence: "float, if applicable"
    explanation: "structured rationale"
  human_oversight:
    reviewer_id: "pseudonymized"       # If human reviewed the output
    review_action: "accepted | modified | rejected"
    review_timestamp: "ISO-8601"
  retention:
    policy: "retention_policy_id"
    expires: "ISO-8601"                # Earliest permissible deletion

Article 13: Transparency and Information to Deployers

Article 13 requires that high-risk AI systems be designed to be sufficiently transparent to enable deployers to interpret outputs and use the system appropriately. It mandates instructions for use that include specific categories of information.

What Engineering Must Build

A system card (instructions for use). Article 13(3) specifies the content: the provider's identity and contact details, the system's characteristics, capabilities, and limitations, the intended purpose, the level of accuracy and robustness achieved in testing, known circumstances that may affect performance, input data specifications, and human oversight measures. This is a version-controlled document that must be updated when the system changes materially.

Interpretability mechanisms. Article 13(1) requires that the system's operation be "sufficiently transparent." For decision-making systems, this means the system must produce outputs that a deployer can interpret -- not just a classification or score, but enough contextual information to understand why the system produced that output. Decision traces with structured explanations satisfy this requirement. Raw model logits do not.

Performance documentation disaggregated by relevant subgroups. Article 13(3)(b)(ii) requires that accuracy metrics be reported for specific persons or groups of persons on which the system is intended to be used. If the system makes decisions affecting individuals, aggregate accuracy metrics are not sufficient. Performance must be documented for the relevant subgroups identified in the risk management process (Article 9) and the bias examination (Article 10).

Article 14: Human Oversight

Article 14 requires that high-risk AI systems be designed to allow effective human oversight during their period of use. This is not a policy statement about human-in-the-loop processes. It is an engineering requirement for specific technical capabilities.

What Engineering Must Build

Override mechanisms. Article 14(4)(d) requires that the system allow the human overseer to "decide, in any particular situation, not to use the high-risk AI system or to otherwise disregard, override or reverse the output of the high-risk AI system." This requires a technical override capability: a human reviewer must be able to accept, reject, or modify the system's output, and the override must be recorded in the system's logs (per Article 12).

Stop mechanisms. Article 14(4)(e) requires the ability to "interrupt the operation of the high-risk AI system through a 'stop' button or a similar procedure that allows the system to come to a halt in a safe state." For software systems, this translates to a graceful shutdown capability that can be activated by authorized personnel without data loss or corruption of the audit trail.

Comprehensible output formats. Article 14(4)(a) requires that human overseers be able to "properly understand the relevant capacities and limitations of the high-risk AI system and be able to duly monitor its operation." This is a design requirement: the system's outputs, dashboards, and monitoring interfaces must be designed for comprehension by the intended human overseers, not just by the engineers who built the system.

Article 15: Accuracy, Robustness, and Cybersecurity

Article 15 requires that high-risk AI systems achieve an appropriate level of accuracy, robustness, and cybersecurity throughout their lifecycle. The key word is "throughout" -- this is not a launch criterion but an ongoing operational requirement.

What Engineering Must Build

Continuous performance monitoring. Article 15(1) requires that the system achieve and maintain appropriate levels of accuracy for its intended purpose. This requires monitoring infrastructure that detects performance degradation over time -- model drift, data distribution shift, or accuracy decline in specific subgroups. The monitoring must be automated, not periodic manual review.

Robustness testing against errors and faults. Article 15(4) requires that high-risk AI systems be resilient to errors, faults, or inconsistencies that may occur within the system or the environment. This includes adversarial testing: the system must be tested against inputs designed to exploit vulnerabilities, including data poisoning and adversarial examples where applicable to the system's input modalities.

Cybersecurity measures specific to AI vulnerabilities. Article 15(5) specifically addresses cybersecurity threats unique to AI systems, including training data poisoning, adversarial inputs designed to cause misclassification, model inversion or data extraction attacks, and exploitation of model or system vulnerabilities. This goes beyond standard application security: the threat model must include AI-specific attack vectors.

Provider vs. Deployer: Who Builds What

The Act distinguishes between providers (organizations that develop or place AI systems on the market) and deployers (organizations that use AI systems under their authority). Articles 9 through 15 primarily obligate providers. However, deployer obligations under Article 26 create requirements that depend on the provider having built the necessary infrastructure.

Requirement	Provider Obligation	Deployer Obligation
Risk management (Art. 9)	Establish, implement, and maintain risk management system	Operate system in accordance with instructions for use; report serious incidents
Data governance (Art. 10)	Document training data, validate for bias, implement quality controls	Ensure input data is relevant and sufficiently representative
Technical documentation (Art. 11)	Create and maintain Annex IV documentation	Retain documentation received from provider; make available on request
Automatic logging (Art. 12)	Build logging capability into the system	Retain logs for minimum 6 months; make available on request
Transparency (Art. 13)	Design for interpretability; provide instructions for use	Use system per instructions; inform affected individuals when applicable
Human oversight (Art. 14)	Build override and stop mechanisms; design for comprehensibility	Assign human overseers; ensure they can exercise oversight effectively
Accuracy and robustness (Art. 15)	Achieve and maintain appropriate accuracy; test for robustness	Monitor performance; report degradation to provider

a16z's AI regulatory analysis emphasizes that many technology companies will occupy both roles simultaneously -- they are providers of their own AI-powered products and deployers of third-party foundation models. This dual role means the company must satisfy both columns of the table above, potentially for different components of the same system.

CE Marking and the Conformity Assessment

High-risk AI systems covered by the Act must undergo a conformity assessment before being placed on the EU market. For most high-risk systems (those not covered by existing EU product safety legislation), this is a self-assessment conducted by the provider. The conformity assessment verifies that the system complies with Articles 9 through 15 and the other applicable requirements. Successful assessment results in CE marking.

The conformity assessment is evidence-based. It requires the provider to demonstrate -- not merely claim -- that each requirement is satisfied. The engineering artifacts described in this article are the evidence. A system without structured logging cannot demonstrate Article 12 compliance. A system without versioned decision logic cannot demonstrate Article 13 transparency. A system without override mechanisms cannot demonstrate Article 14 human oversight.

The assessment must be repeated when the system undergoes substantial modification. The Act defines substantial modification as a change that affects the system's compliance with the applicable requirements or modifies its intended purpose. In practice, this means that major model retraining, significant rule changes, or expansion to new use cases may trigger a new conformity assessment. The change management documentation required by Article 11 (Annex IV) provides the basis for determining whether a change is substantial.

Implementation Sequencing for August 2026

For engineering teams that have not yet begun implementation, the following sequence prioritizes the artifacts that are prerequisites for others:

Classify your systems. Determine which systems are high-risk under Annex III. Non-high-risk systems have lighter obligations. Misclassification in either direction creates risk -- under-classification creates compliance exposure; over-classification wastes engineering resources.
Implement automatic logging (Article 12). Logging is the foundation. Decision traces, performance monitoring, human oversight records, and incident documentation all depend on logging infrastructure. Build it first.
Externalize and version decision logic. Articles 9, 11, and 13 all depend on the ability to identify, version, and retrieve the logic governing system behavior. Extracting decision rules from prompts, application code, or model weights into versioned, auditable artifacts is typically the highest-effort work item.
Build human oversight mechanisms (Article 14). Override capabilities, stop mechanisms, and comprehensible output formats. These are engineering features that require design, implementation, and testing.
Compile technical documentation (Article 11). With logging, versioned rules, and oversight mechanisms in place, the Annex IV documentation can be compiled from evidence rather than reconstructed from memory.
Conduct risk management and data governance reviews (Articles 9, 10). Formalize the risk register, document training data provenance, conduct bias examinations. These benefit from having the logging and documentation infrastructure already in place.
Perform conformity assessment and apply CE marking. The self-assessment verifies that all preceding work is complete and the system satisfies the applicable requirements.

Teams that begin this sequence in mid-2026 will not complete it by August. The logging and rule externalization steps alone typically require three to six months for production systems. The compliance deadline is an engineering planning constraint, not a sprint target.

The Engineering Reality

The EU AI Act's Articles 9 through 15 are regulatory text, but their requirements are engineering deliverables. A risk management system is a versioned risk register with linked test cases. Data governance is a documented data pipeline with quality thresholds and bias analysis. Technical documentation is a maintained set of artifacts that describe the system as it currently operates. Automatic logging is an append-only structured event store. Transparency is decision traces with structured explanations. Human oversight is override mechanisms with audit records. Accuracy and robustness are continuous monitoring with drift detection.

None of these capabilities are exotic. All of them exist in well-governed software systems outside the AI domain. What the Act requires is that AI systems be governed with the same rigor that has been standard practice in safety-critical software for decades. The engineering work is substantial, but it is not unprecedented. The deadline, however, is fixed.

For engineering teams assessing their readiness, the diagnostic question is straightforward: for each of Articles 9 through 15, can you produce the required artifact today? Not a plan to produce it. Not a document describing what it will contain. The actual artifact, current, versioned, and backed by the infrastructure that maintains it. Where the answer is no, the gap between today and August 2026 is the scope of the remaining work.