Home Services Approach Institute About Contact Book a call

Institute/Workflows/Failure Taxonomy

Workflow 04 of 6

Failure Taxonomy.

Tagged postmortems, named-failure-mode catalog.

Without a taxonomy, every incident feels novel and you can't tell whether anything is improving. The Failure Taxonomy workflow names the failure modes — hallucination, refusal-mismatch, retrieval miss, drift, prompt-injection, and so on — so you can count them, attribute them, and trend them. The taxonomy is what turns an incident from "weird thing that happened" into "the third hallucination this quarter."

What this is

The Failure Taxonomy workflow is the procedure for maintaining a written list of named AI failure modes, tagging every production incident against it, adding new failure modes to eval suites as part of the incident response, and reviewing the taxonomy on a regular cadence so it doesn't go stale.

The procedure

  1. Write the failure-mode list. Start with the obvious: hallucination, refusal-mismatch, retrieval miss, drift, length-bound violation, prompt-injection. Each one has a short definition and an example.
  2. Tag every postmortem. Every AI-related incident gets a postmortem; every postmortem gets one or more taxonomy tags. No tag = postmortem isn't done.
  3. Add new modes to evals immediately. A novel failure mode found in production becomes an eval case as part of the incident response — not "later when we have time."
  4. Review monthly. The taxonomy is a living document. New modes get added; obsolete modes get retired. The review meeting takes 30 minutes and produces a dated diff.
  5. Trend the tags. "We had three hallucination incidents this quarter, two last quarter" is the kind of sentence that drives investment decisions. The taxonomy makes that sentence possible.

What gets scored

Maturity dimension Failure taxonomysee the L1 → L5 progression for this dimension

The five questions on the readiness self-assessment that score this dimension are the five rungs of the procedure above. Yes on a question means the artifact named in that step exists on disk in your repo today.

Phase 1 · in active development

This page is a thin first cut. Full procedural documentation — including reference DeepEval suite scaffolds, golden-set curation rubrics, and the audit-evidence checklist — lands in Phase 2 of the Institute build-out.

Find out where your team's Failure-Taxonomy workflow stands.

The free readiness self-assessment scores the Failure-Taxonomy workflow as one of six dimensions. Five minutes. Your weakest workflow is the one most worth fixing first.

Take the assessment →