Durable AI Workflows: Convex Workflows + Declarative State Machines
When building AI applications that go beyond simple chat interfaces, you quickly hit a wall: LLM calls are unreliable, slow, and expensive. How do you build a multi-stage analysis pipeline that survives API timeouts, handles human review gates, and doesn't lose state when things go wrong?
The answer used to require heavyweight infrastructure like Temporal. Now there's a simpler path: Convex Workflows for durability, with an optional declarative layer for complex state machine semantics.
The Problem with AI Pipelines
Most AI frameworks treat workflows as linear chains. LangChain, for example, encourages you to pipe prompts together sequentially. This works for demos, but production systems need:
- Durability: If Gemini's API times out at step 3 of 5, don't lose steps 1-2.
- Human-in-the-loop: Pause for review, then resume exactly where you left off.
- Parallel execution: Run 5 analysis prompts simultaneously, not sequentially.
- Branching logic: Take different paths based on what the AI discovered.
Traditional state machine libraries (XState, Robot) solve the state problem but weren't designed for server-side persistence. Building your own durability layer is complex and error-prone.
The Pragmatic Foundation: Convex Workflows
Don't rebuild infrastructure that exists. Convex Workflows is now production-ready and provides:
- Durable execution with step-level persistence
- Automatic retries with configuration
- Load balancing via Workpool
- Survives server restarts
- Replay from last incomplete step
This is the hard infrastructure work—solved. Your AI pipeline becomes a series of durable steps:
import { WorkflowManager } from "@convex-dev/workflow";
const workflow = new WorkflowManager(components.workflow);
export const analysisPipeline = workflow.define({
args: { threadId: v.id("threads") },
handler: async (step, args) => {
// Step 1: Parse the thread
const parsed = await step.runAction(
internal.stages.parseThread,
{ threadId: args.threadId }
);
// Step 2: Run parallel analysis (fan-out)
const [darvo, tactics, psych] = await Promise.all([
step.runAction(internal.stages.analyzeDarvo, { parsed }),
step.runAction(internal.stages.analyzeTactics, { parsed }),
step.runAction(internal.stages.analyzePsychological, { parsed }),
]);
// Step 3: Synthesize results
const synthesis = await step.runAction(
internal.stages.synthesize,
{ darvo, tactics, psych }
);
return synthesis;
},
});
Each step.runAction is persisted. If the server restarts mid-workflow, it replays from the last incomplete step. This is Temporal-style durability without running Temporal.
When You Need More: Declarative State Machines
Convex Workflows handles the durability infrastructure. But some workflows need explicit state machine semantics:
- Human gates with timeouts: Pause for 24 hours awaiting review, auto-advance if no response
- Conditional branching with guards: Take different paths based on analysis results
- Visual workflow representation: See the entire state graph at a glance
For these cases, build a thin declarative layer on top of Workflows:
const analysisPipeline = defineMachine({
id: 'analysisPipeline',
initial: 'pending',
context: { darvoDetected: false },
states: {
pending: state(on('START', 'parsing')),
parsing: state(
on('SUCCESS', 'analysis'),
on('ERROR', 'failed')
),
analysis: state.parallel({
states: {
darvo: state(on('COMPLETE')),
tactics: state(on('COMPLETE')),
psychological: state(on('COMPLETE')),
},
onDone: [
{ target: 'darvoPath', guard: ctx => ctx.darvoDetected },
{ target: 'standardPath' },
]
}),
awaitingReview: state.human({
timeout: '24h',
instruction: 'Please verify the analysis results.',
on: {
APPROVE: 'synthesis',
REJECT: 'analysis',
}
}),
complete: state.final(),
},
});
This declarative definition compiles down to Workflow steps. The state.parallel() becomes a fan-out pattern. The state.human() becomes a step that waits for external input with a scheduled timeout. You get the readability of a state machine with the durability of Workflows.
The Architecture: Layers of Abstraction
Layer 1: Convex Workflows (Infrastructure)
Handles durability, retries, persistence, replay. You don't touch this.
Layer 2: Declarative State Machine (Optional DSL)
Provides state(), state.parallel(), state.human(), guards, and conditional transitions. Compiles to Workflow steps.
Layer 3: LLM Orchestration (Ax Integration)
Wraps your AI generators with retry logic, parallel execution, and partial-failure handling.
The key insight: separate what you build from what you use. Build the declarative DSL if it helps your team reason about complex workflows. Use Convex Workflows for everything else.
Why This Matters
Latency: Parallel execution means 5 analysis calls complete in the time of the slowest one, not the sum of all five.
Reliability: Each step is persisted. A timeout in Gemini doesn't lose your progress.
Debuggability: Workflow execution history is your audit log. You can see exactly what happened, when, and why.
Human Collaboration: First-class support for human gates means you can build "augmented intelligence" systems where AI does the heavy lifting and humans verify the results.
The Bigger Picture
Most AI engineering today is "hope-based"—hoping the API doesn't time out, hoping the response parses correctly, hoping the user doesn't refresh the page mid-analysis. Production systems can't run on hope.
The solution isn't to build more infrastructure. It's to use infrastructure that already exists (Convex Workflows) and layer on the abstractions your team actually needs (declarative state machines, if complexity warrants).
Don't rebuild retry logic, step persistence, or durability. That's solved. Focus on the domain-specific patterns: human gates, conditional branching, parallel analysis with partial-failure tolerance.
Sometimes the best architecture is knowing what not to build.
This approach emerged from building a psychological analysis pipeline that needed parallel LLM calls, human review gates, and production-grade durability. The lesson: use Convex Workflows for the hard stuff, build thin DSLs for the domain-specific stuff.