Skip to main content

A3 — Robustness & Brittleness

Medium severityNIST AI RMF MEASURE 2.6EU AI Act Art. 15ISO 42001 Cl. 8.4

Domain: A — Technical | Jurisdiction: Global


Layer 1 — Executive card

AI systems that pass testing can still fail unpredictably on unusual inputs, edge cases, or conditions not seen during training — and those failures may only surface in production.

A model can achieve high accuracy in testing and still fail catastrophically on specific input patterns that were absent from the test set. Waymo recalled 1,212 robotaxis in May 2025 after discovering a systematic failure on gates, chains, and gate-like roadway barriers that did not appear in testing. McDonald's IBM AI drive-thru system added hundreds of unwanted items to orders — including 260 chicken nuggets — under unusual ordering patterns. Both systems passed their original testing. Both failed in production.

Have our AI systems been tested against edge cases and adversarial inputs, and do they have a defined, safe fallback when they encounter inputs outside their design envelope?

AI testing cannot provide complete assurance — failure modes are emergent and may only appear in production. What you are approving is adversarial testing before go-live and a designed fallback when the system reaches its limits. This is the control that prevents the "it passed all our tests" post-incident explanation.


Layer 2 — Practitioner overview

Risk description

AI models are optimised to perform well on the data they were trained and tested on. Unlike traditional software where failure modes are predictable from the specification, AI model failure modes are emergent — they may be invisible until they occur in production. A model that achieves high accuracy on benchmark tests can still fail on specific input patterns absent from testing.

Likelihood drivers

  • Deployment environment differs materially from training data context
  • Insufficient adversarial testing before deployment
  • No OOD detection to flag inputs outside the training distribution
  • Model used beyond its documented operational design domain
  • No graceful degradation — system produces confident outputs when confidence is low

Consequence types

TypeExample
Safety incidentAutonomous system failure on physical edge cases
Customer experienceAI system failing under unusual but realistic inputs
Reputational damageViral failure incidents (McDonald's nugget orders)
Financial liabilityConsequential harm from high-stakes domain failures

Affected functions

Technology · Product · Operations · Customer Service · Risk

Controls summary

ControlOwnerEffortGo-live?Definition of done
Adversarial testingTechnologyMediumRequiredStructured adversarial test suite completed. Results documented. No critical failures at go-live.
OOD detectionTechnologyMediumRequiredOOD mechanism active. Out-of-distribution inputs flagged or rejected. Threshold documented.
Operational design domainRiskLowRequiredAI Register defines the ODD — conditions under which model is approved to operate.
Graceful degradationTechnologyMediumRequiredDocumented and tested fallback when confidence is low. Does not produce confident wrong output.

Layer 3 — Controls detail

A3-001 — Adversarial testing

Owner: Technology | Type: Detective | Effort: Medium | Go-live required: Yes

Systematically test models against edge cases, unexpected inputs, and adversarial examples before deployment. Include: boundary testing, noise injection, distribution shift testing, and semantic equivalence testing. Maintain a growing library of historical failure cases. Run on every model update.

A3-002 — OOD detection

Owner: Technology | Type: Preventive | Effort: Medium | Go-live required: Yes

Implement mechanisms to detect when inputs fall outside the training distribution. Flag or reject these inputs rather than processing them silently. Threshold defined and documented in the model risk record.

A3-003 — Graceful degradation

Owner: Technology | Type: Preventive | Effort: Medium | Go-live required: Yes

Design systems to fail safely — revert to human decision, flag for review — rather than producing confident wrong outputs when uncertainty is high. Test the fallback as rigorously as the primary system.

KPIs

MetricTargetFrequency
OOD detection rate> 95% of intentional OOD inputs flagged in testingPre-deployment
Adversarial test pass rate100% of critical edge cases handled safelyPre-deployment + quarterly

Layer 4 — Technical implementation

from sklearn.ensemble import IsolationForest
import numpy as np

# OOD detection via Isolation Forest
clf = IsolationForest(contamination=0.01, random_state=42)
clf.fit(X_train) # Fit on known-good training data

def check_ood(input_features):
score = clf.decision_function([input_features])[0]
is_ood = score < OOD_THRESHOLD
if is_ood:
return {"action": "flag_for_review", "ood_score": score}
return {"action": "proceed", "ood_score": score}

# Tools: Giskard (AI testing), DeepChecks, ART (IBM Adversarial Robustness Toolbox)
# Conformal prediction: MAPIE

Incident examples

Waymo recall 1,212 robotaxis (May 2025): Waymo's fifth-generation ADS software failed to correctly detect and respond to chains, gates, and gate-like roadway barriers. The failure mode was absent from testing but present in real-world deployment. 16 low-speed collisions occurred before software was updated. NHTSA recall filed May 2025.

McDonald's IBM AI drive-thru discontinued (2024): McDonald's IBM automated order-taking system added unwanted items to orders under unusual inputs — including 260 chicken nuggets in one documented case. System was discontinued at all 100+ test locations in July 2024 after viral incidents demonstrating brittleness under realistic but unusual user behaviour.


Scenario seed

Context: A hospital deploys a clinical AI diagnostic system that performs well on their imaging equipment. A rural facility partnership is announced.

Trigger: The rural facility uses different scan protocols. Clinical staff notice the AI's confidence scores are unusually low. They proceed anyway, trusting the system.

Complicating factor: The ODD was not defined — there is no technical control preventing use on out-of-distribution imaging data.

Discussion questions: What ODD documentation would have prevented deployment to the rural facility without validation? How should OOD detection be designed for clinical systems? Who is accountable for the deployment decision?

Difficulty: Intermediate | Jurisdictions: Global

[Full scenario with discussion questions available in the AI Risk Training Module — coming soon.]