This tutorial walks through the full Sepurux workflow for an OpenAI agent: instrument, record, replay, and inspect. It takes about five minutes if you already have an OpenAI agent running.
Prerequisites
- A Sepurux account and project (sign up free at app.sepurux.dev)
- Your SEPURUX_API_KEY and SEPURUX_PROJECT_ID from the dashboard
- An OpenAI agent you want to test — any Python script using openai.chat.completions.create works
Step 1: Install the SDK
pip install sepuruxStep 2: Wrap your agent with instrumentation
Sepurux provides an OpenAI wrapper that intercepts model calls and records them as structured trace events. Replace your openai.OpenAI() client with the instrumented version:
import os
import openai
from sepurux import SepuruxClient
from sepurux.integrations.openai import instrument_openai
client = SepuruxClient.from_env()
# Your agent logic — unchanged except for the instrumented client
with client.trace("support_triage", {"ticket_id": "t-1042"}) as trace:
ai = instrument_openai(openai.OpenAI(), recorder=trace)
response = ai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a support triage agent."},
{"role": "user", "content": "Ticket t-1042: customer reports payment declined."},
],
)
print(response.choices[0].message.content)
print("Trace ID:", trace.trace_id)Set your credentials via environment variables or pass them directly:
export SEPURUX_API_KEY=your-api-key
export SEPURUX_PROJECT_ID=your-project-idRun your agent once. Sepurux records the trace — model calls, tool calls, arguments, and results — and uploads it on context exit. You'll see the trace appear in the dashboard under Traces within a few seconds.
Step 3: Create a reliability campaign
A campaign defines the fault conditions you want to replay against your trace. Create one from the Campaigns page in the dashboard, or via the API:
curl -sS -X POST https://api.sepurux.dev/v1/campaigns \
-H "Content-Type: application/json" \
-H "X-API-Key: $SEPURUX_API_KEY" \
-H "X-Project-Id: $SEPURUX_PROJECT_ID" \
-d '{
"name": "Support Triage — Core Reliability",
"mutation_pack_id": "sepurux.core.reliability",
"eval_set": { "checks": ["status"] }
}'The core reliability pack includes 10+ mutation types covering schema renames, timeouts, value corruptions, and forced errors — the most common failure conditions in production tool-using agents.
Step 4: Queue a run
Queue a reliability run using your trace ID and campaign ID. The worker replays the trace multiple times, applying a different mutation to each attempt:
curl -sS -X POST https://api.sepurux.dev/v1/runs \
-H "Content-Type: application/json" \
-H "X-API-Key: $SEPURUX_API_KEY" \
-H "X-Project-Id: $SEPURUX_PROJECT_ID" \
-d '{
"trace_id": "<your-trace-id>",
"campaign_id": "<your-campaign-id>"
}'Or use the SDK's assert helper to queue a run and block until it completes — useful in CI:
from sepurux import SepuruxClient
client = SepuruxClient.from_env()
client.assert_run(
trace_id="<your-trace-id>",
campaign_id="<your-campaign-id>",
min_pass_rate=0.85,
max_unsafe=0,
)
# Exits non-zero if the run fails thresholdsStep 5: Inspect results
Open the run in the dashboard to see the full breakdown: pass rate, failed attempts, first-failure tool attribution, mutation trace, and recommended fixes derived from the failure pattern.
- Pass rate: what percentage of replay attempts succeeded under mutation
- Failed attempts: which specific mutations caused failures, and why
- Tool fragility: which tools your agent depends on most critically
- Policy events: any tool calls that violated configured safety policies
- Recommended fixes: high-confidence suggestions derived from the failure pattern
What to do with the results
A first run typically surfaces 2–4 fragile points. Common findings: an agent that doesn't handle renamed status fields, a payment flow that doesn't retry on timeout, a tool call that proceeds without checking for an expected field.
Fix the fragilities in your agent, add the run assertion to your CI pipeline, and set a pass rate threshold that reflects your risk tolerance. Future deployments that regress the agent's behavior will fail the gate before they reach production.
The full workflow — instrument, trace, campaign, run, assert — takes under 15 minutes end to end the first time. After that, every commit is tested against the same fault conditions automatically.
