Is Your AI System High-Risk? A Researcher's Guide to EU AI Act Annex III Classification

The Hook

You've built a machine learning model that predicts student dropout rates. It uses academic performance data, attendance records, and socioeconomic indicators. The university's student services team wants to use it to flag at-risk students for early intervention. It sounds helpful, even benign. But here's the problem: under EU AI Act Annex III, category 3(b) covers AI systems used "to assess and evaluate learning outcomes" and "to monitor and detect prohibited student behaviour." If your system influences decisions about educational support, it may qualify as high-risk. And high-risk means technical documentation, a conformity assessment, a CE marking process, and post-market monitoring. Getting this wrong isn't a technicality. It's a compliance failure with real legal exposure.

What Annex III Actually Covers

Annex III lists eight categories of AI systems presumed to be high-risk under Article 6(2). Each category is defined by both the domain and the function. Understanding the scope requires reading each one precisely.

Category 1: Biometric identification and categorisation of natural persons

This covers AI systems intended to be used for real-time and post remote biometric identification of individuals. The key word is "identification." A system that recognises a face to verify identity in a controlled access scenario falls here. So does a system that categorises individuals by inferred characteristics (emotion, race, political opinion) from biometric data. Research labs building face recognition pipelines, emotion classifiers, or gait analysis tools need to pay close attention to whether the intended deployment involves identification of people in real or uncontrolled settings.

Category 2: Critical infrastructure

This addresses AI systems intended to be used as safety components in the management and operation of critical infrastructure: roads, water, gas, electricity, heating, and digital infrastructure. A traffic flow optimisation model is not automatically in scope. What triggers coverage is whether the system functions as a safety component. A load-balancing model for an electricity grid that can initiate load shedding crosses the threshold. An advisory dashboard that humans act on may not, depending on deployment context.

Category 3: Education and vocational training

This covers AI used to determine access to, or admission into, educational institutions (3a), to assess and evaluate learning outcomes (3b), and to monitor and detect prohibited student behaviour (3c). The dropout prediction model described in the opening belongs here. So does an automated essay scoring system used for summative assessment, or a plagiarism detection tool that results in academic consequences.

Category 4: Employment, workers management, and access to self-employment

This is one of the broadest categories in practice. It captures AI used to recruit or select individuals (4a), make decisions affecting working conditions including promotions and terminations (4b), and allocate tasks or monitor performance (4c). A researcher building a resume screening tool, even as a research prototype, needs to assess whether the system is intended to be deployed in an employment context. Research on bias in hiring algorithms that involves actual resume scoring at scale may also fall here.

Category 5: Essential private and public services

This covers access to credit scoring (5a), insurance pricing (5b), and social benefits or services (5c), including public housing. It also covers dispatching or prioritising emergency services (5d). A researcher building a creditworthiness model, even for academic benchmarking, should check whether the model is intended for actual use by a lender. If a financial institution is a research partner and plans to deploy the output, the provider classification analysis applies.

Category 6: Law enforcement

This category covers AI used to assess the risk that an individual poses of committing criminal offences, to polygraph-style tools and emotion detection in law enforcement contexts, and to deep fake and crime analytics. Academic systems that profile individuals for predictive policing purposes or assist in evidence evaluation fall here. It's one of the most sensitive categories given the fundamental rights implications.

Category 7: Migration, asylum, and border control management

This covers systems used to assess irregular migration risks, assist in asylum claim processing, and support visa or permit decisions. Researchers building risk classification tools in collaboration with immigration authorities need to consider this category carefully.

Category 8: Administration of justice and democratic processes

This covers AI used to assist judicial bodies in researching, interpreting, and applying the law to specific cases. An NLP system that summarises case law for judges to review is arguably in this space. A system that actively recommends outcomes or assigns risk scores to defendants is clearly in scope.

The Classification Test

Knowing the eight categories is only step one. Classification under Annex III requires a two-step analysis.

Step 1: Does the system fall within an Annex III category?

This requires matching your system's intended purpose to a listed category. The operative concept is "intended purpose," defined in Article 3(12) as the use for which a system is intended by the provider, including the specific context and conditions of use. A system built for research but intended to be deployed in a live hiring context is assessed against its intended deployment purpose, not its research framing.

Step 2: Does the system pose a significant risk to health, safety, or fundamental rights?

Article 6(2) presumes that systems falling within Annex III are high-risk. But Article 6(3) creates a path out. If a provider determines that a system listed in Annex III does not, in fact, pose a significant risk, the provider can self-assess it as non-high-risk. This requires completing the Article 6(3) self-assessment and notifying the relevant market surveillance authority via the EU database established under Article 71.

There's a further nuance under Article 6(1). Even if a system fits Annex III, it may be excluded from high-risk classification if it's a component used for a narrow auxiliary purpose, or if its influence on the relevant decision is so limited that it can't meaningfully affect health, safety, or rights. For example, a spell-checker embedded in an HR platform doesn't become high-risk simply because it operates in an employment context.

The self-assessment under Article 6(3) is not a rubber stamp. The provider must document its reasoning, including what features of the system reduce risk. If your system is later audited and the self-assessment looks pretextual, you've created a worse compliance problem than if you'd just treated it as high-risk from the start.

Practical Research Scenarios

Scenario A: Academic lab building a bias-detection tool for hiring systems

This system ingests resumes and job descriptions and flags patterns that correlate with protected characteristics. The lab intends to partner with a mid-sized employer to validate the tool in a live recruitment pipeline.

Classification: High-risk. The tool is intended to be used in the context of employment recruitment and selection under Annex III, category 4(a). The fact that the primary goal is detecting bias doesn't remove it from scope. The intended deployment involves actual candidate evaluations. The lab, as provider, must comply with Chapter 3 requirements: technical documentation under Article 11, data governance under Article 10, human oversight provisions under Article 14, and accuracy and robustness requirements under Article 15.

Scenario B: Research system for urban traffic prediction

A research group has built a model that forecasts traffic congestion across a city's road network. The output is a heatmap visualisation reviewed by a city traffic management team, who make manual decisions about signal timing. The model does not directly control any infrastructure.

Classification: Likely not high-risk. Traffic prediction that functions as advisory information reviewed by human operators doesn't meet the threshold for Annex III category 2, which requires the system to function as a safety component in the management of critical infrastructure. The human decision layer is meaningful here. That said, if the city deploys this system to automatically control signals, the analysis changes entirely.

Scenario C: NLP tool for analysing judicial rulings

A computational law lab has built a tool that clusters judicial decisions by legal concept, extracts cited precedents, and identifies inconsistencies across rulings. It's designed as a research instrument for legal scholars. It doesn't recommend outcomes or score cases.

Classification: Likely not high-risk. Annex III category 8 targets AI that assists judicial bodies in applying law to specific cases. A descriptive analysis tool used by researchers, not by courts in live decision-making, doesn't fit that description. The provider should document the intended use clearly and ensure the tool isn't later repurposed in a judicial workflow without a fresh classification assessment.

Scenario D: Facial recognition system for research purposes

A university computer vision lab has trained a face recognition model using a publicly available dataset. The model is evaluated internally, with results published in a research paper. There's no deployment plan.

Classification: Depends on deployment context, and that's precisely the problem. The EU AI Act distinguishes between research and development (Article 2(6) exempts pure R&D from most requirements) and systems placed on the market or put into service. If the model stays in the lab and is never deployed to identify actual individuals, it may fall under the research exemption. But if the lab transfers the model to a third party, even under a research collaboration agreement, and that party uses it to identify individuals, the exemption disappears. Labs need to track what happens to their models after publication.

What to Do If You're High-Risk

If your system qualifies as high-risk under Annex III, here's what you need before placing it on the EU market or putting it into service.

Technical documentation under Article 11 and Annex IV must describe the system's intended purpose, architecture, training data, performance metrics, and known limitations. This documentation has to exist before deployment, not after.

A conformity assessment under Article 43 is required. For most Annex III systems (excluding categories 1(a) and 6(a), which require notified body involvement), providers can conduct an internal conformity assessment. For systems in biometric identification or certain law enforcement applications, an accredited notified body must be involved.

Registration in the EU AI database under Article 71 is mandatory before deployment. This is the public-facing record of high-risk AI systems operating in the EU.

Post-market monitoring under Article 72 is ongoing. You'll need a monitoring plan that tracks real-world performance and incident reporting obligations under Article 73.

The compliance timeline for most high-risk systems under Annex III is August 2026. Start documentation now, not six months from now.

Get Your Classification Right

Annex III classification is one of the highest-stakes decisions your lab or institution will make under the EU AI Act, and it's not a decision that should rest on a single lawyer's reading or a two-page checklist.

Better Societies runs diagnostic sessions specifically for researchers and academic institutions working through exactly this question. We'll map your system against each Annex III category, walk through the Article 6(3) self-assessment criteria, and help you document your reasoning in a way that holds up to scrutiny.

If you're unsure whether your system is high-risk, that uncertainty is itself a signal. Book a session at bettersocieties.world/qualify.