Mastering Article 11: How to Automate Your Technical Documentation

Jasper Claes

⚡ TL;DR

  • Article 11 requires a complete, living Technical File before market placement — kept current for the system’s entire operational life.
  • Roughly 60–70 % of Annex IV content can be auto-populated from data that already exists in your ML pipeline, model registry, and observability stack.
  • Teams that treat the Technical File as a product artefact maintained in version control — not a compliance document assembled pre-audit — spend a fraction of the time and produce far better output.

The conversation plays out the same way in almost every engineering org confronting Article 11 for the first time. A compliance manager says “we need a Technical File for the hiring-AI product.” An ML engineer asks “what’s in it?” The answer — eight Annex IV sections spanning system architecture, data governance, bias testing, cybersecurity, human oversight, and a signed Declaration of Conformity — lands like a sprint that is definitely going to take longer than a sprint.

The anxiety is real but mostly misplaced. Most of the information Article 11 requires already exists somewhere in your organisation. It lives in MLflow experiment logs, DVC dataset registries, Confluence architecture pages, and Jira tickets — none of which speak Annex IV. The solution is not to write more documentation. It is to build pipelines that extract what already exists and route it into a structure that answers the specific questions the regulation asks.

This post gives you the precise automation map: what to extract from where, which sections require human judgment instead of extraction, and how to build a trigger system that keeps the file current without a recurring manual burden. For the definitive blueprint of every Annex IV section’s required content, see our companion pillar: Article 11 & Annex IV: How to Build a Compliant AI Technical File.

What Article 11 Actually Demands — and Why “Sufficient” Is a High Bar

Article 11(1) requires providers of high-risk AI systems to “draw up technical documentation in accordance with Annex IV” before placing their system on the market. The documentation must be sufficient for competent authorities and notified bodies to independently assess compliance — meaning a technically literate external assessor, who has never spoken to your team, can verify every material compliance claim without asking follow-up questions.

Three obligations compound the baseline requirement:

  • Pre-market timing. The file must exist before your first EU customer goes live, not during a subsequent audit cycle.
  • Perpetual currency. Article 11(1) mandates the file be “kept up to date.” Every substantial modification — new model version, new dataset, expanded use case — may require an update.
  • Ten-year retention. Per Article 18, the file must be accessible to market surveillance authorities for ten years after the system is withdrawn from the market.

These three obligations together mean that treating the Technical File as a project deliverable is structurally the wrong approach. It needs to be a living system with automated feeds from your engineering pipeline, version-control history, and named owners for judgment-dependent sections.

The Annex IV Automation Map: Extract vs. Author

The eight Annex IV sections fall into three automation tiers. Understanding this taxonomy before you start stops teams from manually writing content that could be generated, and stops them from expecting automation to produce content that requires legal judgment.

Annex IV SectionTierPrimary Data SourceHuman Input Required For
§1 General Description & Intended Purpose⚠️ AuthorProduct spec, model registry name/version metadataIntended purpose, negative scope, affected persons, foreseeable misuse — all legal commitments requiring sign-off
§2 Design & Development Process🔄 Semi-autoIaC configs, CI/CD pipeline, API schema, model registryDesign rationale; third-party component compliance status; human oversight architecture
§3 Training, Validation & Test Data✅ Largely autoDVC lineage, dataset registry, demographic stats pipeline, bias eval outputsAcceptability judgment for residual bias; data governance policy authorship
§4 Monitoring, Logging & Performance✅ Largely autoMLflow / W&B logs, model card, observability platformKnown-limitations narrative; conditions-of-degraded-performance interpretation
§5 Cybersecurity Documentation🔄 Semi-autoPen-test reports, adversarial robustness outputs, SAST/DAST resultsThreat model authorship; adversarial test scope; residual risk judgments
§6 Testing & Validation Results✅ Largely autoEvaluation pipeline outputs, MLflow experiment registry, test CI logsPre-determined thresholds (must be authored before tests run); test plan authorship
§7 Post-Market Monitoring & Deployer Instructions✅ Largely autoLogging config, monitoring dashboards, alerting rulesHuman oversight instruction narrative; deployer-facing capability/limitation language
§8 Declaration of Conformity⚠️ AuthorQMS records, conformity assessment outputsEntire document — legal instrument requiring a named, authorised signatory

Automation Recipes by Section

Sections 3 & 6 — Automate Data and Test Documentation First

These two sections are the highest ROI targets because they are data-dense, updated most frequently, and most commonly incomplete in pre-audit Technical Files.

For Section 3, the core tooling is dataset versioning and provenance tracking. DVC (Data Version Control) creates a cryptographic lineage record tying every training run to the exact dataset version used — satisfying the Annex IV provenance requirement once the pipeline is configured. Supplement with Hugging Face Dataset Cards for structured dataset characteristic documentation whose schema maps directly onto Annex IV §3 fields.

For bias evaluation, integrate Fairlearn (Microsoft’s open-source fairness toolkit) or IBM AI Fairness 360 into your evaluation pipeline. Configure both to emit structured JSON reports at evaluation time; attach those reports directly to the Section 3 record as timestamped evidence.

For Section 6, the critical architectural decision is pre-registering acceptance thresholds as metadata before each experiment run executes. In MLflow, log thresholds as run parameters rather than post-hoc tags. In Weights & Biases, include them in the run config object. This creates the timestamped audit trail showing thresholds preceded results — a specific point assessors check because post-hoc threshold-setting is a well-known data integrity failure mode.

Section 4 — Auto-Generate Your Model Card

Section 4 is essentially a model card with regulatory framing. If you follow Google’s Model Cards framework or the Hugging Face Model Card spec, you have the Section 4 foundation already. The remaining step is ensuring the card includes four Annex IV-specific fields often absent from standard model cards: conditions under which performance degrades, specific population groups tested, pre-determined thresholds per metric, and the deployed system’s logging capabilities.

Configure your experiment tracking platform to auto-generate the model card as a pipeline artefact at the end of each evaluation run — pulled from logged metrics, tagged dataset versions, and pre-registered thresholds. The result is a Section 4 that is always current without anyone writing it from scratch.

Section 2 — Extract Architecture from Infrastructure-as-Code

Your Kubernetes manifests or Terraform configurations already describe your deployment topology. Your OpenAPI specifications already describe input/output contracts. Your CI/CD pipeline configuration already describes your development process. Build an extraction step in your pipeline that produces: an architecture summary from IaC configs; an input/output specification from your API schema in non-engineer language; and a third-party component registry listing every external model, API, or dataset with compliance status flagged. Any “unknown” compliance status triggers a human review task before the Technical File can be submitted.

Section 5 — Ingest Security Scan Outputs Automatically

AI-specific Section 5 requires two things standard IT security documentation does not: adversarial robustness testing and an AI-specific threat model. The threat model requires human authorship; test results can be ingested automatically. Use the OWASP Top 10 for LLM Applications as your structured adversarial testing taxonomy — it classifies AI-specific attack vectors (prompt injection, training data poisoning, model inversion) and is recognised by European regulators as a credible reference framework. Configure your adversarial testing tools to emit OWASP-classified reports and ingest them automatically into the Section 5 record.

The Documentation Trigger System

Without a trigger system, the Technical File decays silently. With one, every significant change produces a documented update and the file always describes the actual operational system.

Change EventTrigger LevelAutomated ActionHuman Review Gate
Model retrain — same architecture & datasetLowAuto-refresh §4 & §6 from new experiment runNone if metrics within documented thresholds
Dataset version updateMediumAuto-update §3 provenance; re-run bias evalBias lead reviews new fairness outputs before release
Architecture or component changeHighRe-extract §2; flag §5 for adversarial retestCompliance team assesses substantial-modification threshold
Intended-use expansion or new deployment contextCriticalBlock deployment until §1 updated and new conformity assessment completedFull compliance team review; potential Notified Body re-assessment
Upstream GPAI model updateMedium–HighAlert compliance team; re-run baseline performance testsTeam determines if post-market monitoring shows material behaviour change

Implement triggers as a mandatory CI/CD gate: a pre-deployment check that classifies the proposed change and refuses to deploy until the appropriate documentation action is completed and reviewed. This makes documentation maintenance a gate, not a suggestion.

The Human Judgment Layer: What Automation Cannot Replace

Three categories always require named human accountability regardless of tooling sophistication.

Intended purpose and negative scope (Section 1). These are legal commitments. “The system is intended for initial CV screening within EU financial services. It is not designed for performance evaluation, promotion decisions, or termination processes.” That language requires a product lead and legal reviewer — not a template fill.

Residual risk acceptability (linked to Article 9). Automated pipelines can surface risks. They cannot determine whether a residual 3 % disparate false-positive rate against a demographic group is acceptable given the system’s benefits and available mitigations. That judgment requires a named risk owner who can defend the decision in writing.

Declaration of Conformity (Section 8). Article 47 requires the Declaration to be signed by a person with authority to bind the legal entity. It can be generated as a template, but the final instrument requires a named signatory and a genuine conformity assessment process behind it. See our post on passing an AI Act conformity assessment.

Frequently Asked Questions

What exactly must Article 11 technical documentation contain?

Annex IV specifies eight areas: (1) general description including intended purpose, version, and instructions for use; (2) design and development process including architecture, compute resources, and third-party components; (3) training, validation, and test data including provenance, preprocessing, demographic composition, and bias assessment; (4) performance information including a model card with pre-determined thresholds and known limitations; (5) cybersecurity documentation including AI-specific threat model and adversarial robustness results; (6) complete testing and validation evidence with pre-dated thresholds; (7) post-market monitoring plan and human oversight instructions for deployers; and (8) the signed EU Declaration of Conformity. Every section must be specific enough for an external assessor to independently verify material compliance claims without asking follow-up questions.

How often must the Technical File be updated?

Whenever the system changes in a way that affects the accuracy of documented claims or the system’s compliance status. In practice: every retrain producing materially different metrics (§4, §6); every dataset version change (§3); every architecture change (§2, §5); any use-case expansion (all sections, plus potential new conformity assessment). Build change-triggered updates into your CI/CD pipeline to prevent silent decay, and retain all historical versions for ten years after system withdrawal.

Can we use our existing model card as the Technical File?

No — but it is an excellent foundation for Section 4. A standard model card covers performance characteristics, limitations, and evaluation methodology well. The Technical File additionally requires §1 intended purpose and legal scope, §2 architecture and development process, §3 complete data governance records with provenance, §5 adversarial robustness documentation, §6 formal test results with pre-dated thresholds, §7 deployer instructions, and §8 the Declaration of Conformity. Build the Technical File around the model card, not in place of it.

Does Article 11 apply to AI systems we use internally?

Yes. Article 2 defines scope as systems placed on the market or put into service — which includes internal deployment for high-risk purposes. If your organisation builds an AI system used internally for employment decisions, credit decisions, or other Annex III purposes, you are a provider of a high-risk AI system and Article 11’s Technical File obligation applies. See our post on provider vs. deployer obligations.

What are the penalties for an incomplete Technical File?

Operating a high-risk AI system without the required Technical File violates Article 16(d) — subject to Tier 2 penalties under Article 99: up to €15 million or 3 % of global annual turnover, whichever is higher. Providing a false or incomplete file to market surveillance authorities is a Tier 3 violation: up to €7.5 million or 1 % of turnover. The Technical File is also the first document any market surveillance authority will request when investigating an AI-related complaint.

Ready to automate your Annex IV Technical File?

Unorma’s Document Generator connects to your ML infrastructure — MLflow, DVC, W&B, Fairlearn — and auto-populates 60–70 % of your Technical File. Human review tasks surface only where judgment is genuinely required.Automate Your Technical File →

Share this post

Leave a Reply