A lot of AI agent projects have a demo review.

Some have a security review. Some have a pilot recap. Some have a founder saying, “looks good, ship it.”

What they do not have is a real readiness review.

That is the gap between “this seems promising” and “this is safe enough to touch production.”

And that gap matters, because production does not care how good the demo looked. Production cares about boring questions:

  • does the workflow have a clear boundary?
  • does someone own it?
  • is failure visible?
  • is rollback possible?
  • are approvals clear?
  • will operators know what to do at 4:58 PM on a Friday when the agent gets weird?

If the answer to those questions is fuzzy, the workflow is not ready. It is just close enough to be dangerous.

What a readiness review actually is#

A readiness review is not a giant committee ritual.

It is a short, hard checkpoint before go-live. The point is to force explicit answers on the things that cause most production pain:

  • unclear scope
  • unclear ownership
  • unclear permissions
  • unclear exception handling
  • unclear rollback paths
  • unclear support expectations

A readiness review should not be theoretical. It should decide one thing:

is this workflow ready to touch real work right now, or not?

That means the output is not a vibes-based summary. It is one of three calls:

  • ready
  • ready with constraints
  • not ready

That is it. If your review cannot produce a decision, it is not a review. It is a meeting.

Why teams skip this step#

Usually because they think they already covered it somewhere else.

They assume:

  • the pilot proved readiness
  • the security review proved readiness
  • the happy path proved readiness
  • the builder’s confidence proved readiness

None of those prove readiness.

A pilot proves there may be value. A security review proves the system might be governable. A happy path proves almost nothing. And builder confidence is free.

Readiness is different. Readiness means the workflow is operationally legible enough to survive contact with reality.

The 10 checks in a real AI agent readiness review#

You do not need a 90-question spreadsheet. You need ten checks that surface whether the workflow is actually shippable.

1. Workflow boundary#

Can somebody describe exactly what the agent does in plain language?

For example:

The agent reviews inbound intake submissions, validates required fields, enriches records from approved sources, drafts a routing recommendation, and sends low-confidence cases to a human queue.

That is clear.

This is not clear:

It helps automate operations with intelligent orchestration.

If the workflow boundary is vague, everything downstream gets vague too:

  • permissions
  • ownership
  • expectations
  • blame

No boundary, no readiness.

2. Action policy#

The team should be able to answer three questions immediately:

  • What can the agent do automatically?
  • What requires approval?
  • What is never allowed?

If that split is not explicit, you do not have a deployable workflow. You have a future incident.

This matters most when the workflow touches:

  • customer-facing communication
  • CRM or ERP writes
  • financial actions
  • contract language
  • legal or compliance decisions
  • access or permissions

If nobody can point to the action policy, the review should stop there.

3. Data exposure#

What data does the agent read, write, log, or pass through prompts?

You should be able to map:

  • source systems
  • destination systems
  • sensitive fields
  • retention concerns
  • logging behavior
  • whether PII or financial data is in scope

A lot of teams think they understand data exposure because they know what tools are connected. That is not enough.

Readiness means understanding the actual data path through the workflow.

4. Ownership#

Who owns the workflow after launch?

Not in a vague strategic sense. Operationally.

Who owns:

  • business outcomes
  • approval rules
  • exception queues
  • change requests
  • failure review
  • the decision to pause or kill the workflow

If the answer is “kind of shared,” it is probably not owned. And unowned workflows rot fast.

5. Exception handling#

Every AI agent demo looks cleaner than production. That is normal. The real test is what happens when the workflow sees:

  • missing inputs
  • contradictory inputs
  • low confidence
  • tool failures
  • stale data
  • downstream system errors

Where does that work go? Who sees it? What context do they get? What is the expected response time?

“Escalates to a human” is not a process. It is a sentence.

6. Failure visibility#

If the workflow starts failing quietly, how long until somebody notices?

This is where a lot of teams get humbled. They assume someone will know because the workflow is important. That is not monitoring.

A readiness review should confirm:

  • what counts as failure
  • what metrics are tracked
  • where alerts go
  • who is expected to respond
  • whether the team can distinguish a bad run from a temporary blip

If failure detection depends on a customer complaining, the workflow is not ready.

7. Rollback and kill path#

What happens if the workflow needs to be paused right now?

Can the team:

  • disable automatic actions?
  • route everything to human review?
  • revert a prompt or config change?
  • switch to a fallback path?
  • isolate one tenant or one integration without dropping everything?

If rollback depends on finding the one builder who understands the wiring, that is not production readiness. That is dependency risk with nice branding.

8. Operator support#

Who carries the operational burden after launch?

This is where fake automation gets exposed. Sometimes the “automation” works by quietly creating more review work for humans. Sometimes it pushes ugly cases into a queue nobody staffed. Sometimes it creates a support expectation nobody priced.

A readiness review should ask:

  • who watches the queue?
  • who handles weird cases?
  • how much daily work is added?
  • is that work visible and accepted?

If the workflow saves leadership time by creating hidden operator cleanup, you have not improved the system. You have moved the pain.

9. Change control#

How will this workflow be changed after launch?

That includes changes to:

  • prompts
  • rules
  • thresholds
  • retrieval sources
  • tool access
  • approval logic
  • destination mappings

Without a change process, teams accidentally turn production into a live experiment.

Readiness does not require heavyweight bureaucracy. It requires enough discipline that a meaningful change is deliberate, tested, and reversible.

10. Success criteria#

What does “working” actually mean?

Not “people like it.” Not “the demo was impressive.”

Real success criteria sound like:

  • cut routing time by 40%
  • reduce manual triage load by 30%
  • keep false approvals under 2%
  • maintain under-4-hour exception turnaround
  • increase first-pass completeness from 60% to 85%

If success is undefined, the workflow will drift into politics. People will decide whether it is good based on whichever anecdote is loudest.

That is how a decent system gets killed or a bad one survives too long.

What “ready with constraints” looks like#

A lot of workflows are not fully ready, but still deployable in a narrower form. That is fine.

In fact, that is often the right call.

Examples:

  • allow draft generation, but not auto-send
  • allow internal routing, but not system-of-record writes
  • allow one business unit, but not all customers yet
  • allow low-risk cases only, with high-risk cases forced to review

This is the right use of constraints. Not as a way to dodge hard decisions, but as a way to reduce blast radius while the workflow earns trust.

A good readiness review does not ask, “can we launch everything?” It asks, “what version of this is safe to launch now?”

What should happen after the review#

Do not end with a fuzzy summary. End with a decision log.

Capture:

  • readiness status
  • approved scope
  • known constraints
  • open blockers
  • named owners
  • rollback path
  • first review date after launch

That last one matters. A workflow can be ready today and messy two weeks later if volumes, inputs, or operator habits change.

Production readiness is not permanent. It is a status that has to be maintained.

The simple rule#

If the team cannot explain how the workflow behaves when things go right, when things go wrong, and when things need to stop, the workflow is not ready.

That is the whole game.

A readiness review is just the shortest honest path to that answer.

If you want help turning a promising AI workflow into something you can actually ship without creating a cleanup job in disguise, check out the services page.