Name: Sepurux
Author: Sepurux

This tutorial walks through the full Sepurux workflow for an OpenAI agent: instrument, record, replay, and inspect. It takes about five minutes if you already have an OpenAI agent running.

Prerequisites

A Sepurux account and project (sign up free at app.sepurux.dev)
Your SEPURUX_API_KEY and SEPURUX_PROJECT_ID from the dashboard
An OpenAI agent you want to test — any Python script using openai.chat.completions.create works

Step 1: Install the SDK

bash

pip install sepurux

Step 2: Wrap your agent with instrumentation

Sepurux provides an OpenAI wrapper that intercepts model calls and records them as structured trace events. Replace your openai.OpenAI() client with the instrumented version:

python

import os
import openai
from sepurux import SepuruxClient
from sepurux.integrations.openai import instrument_openai

client = SepuruxClient.from_env()

# Your agent logic — unchanged except for the instrumented client
with client.trace("support_triage", {"ticket_id": "t-1042"}) as trace:
    ai = instrument_openai(openai.OpenAI(), recorder=trace)

    response = ai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a support triage agent."},
            {"role": "user", "content": "Ticket t-1042: customer reports payment declined."},
        ],
    )
    print(response.choices[0].message.content)

print("Trace ID:", trace.trace_id)

Set your credentials via environment variables or pass them directly:

bash

export SEPURUX_API_KEY=your-api-key
export SEPURUX_PROJECT_ID=your-project-id

Run your agent once. Sepurux records the trace — model calls, tool calls, arguments, and results — and uploads it on context exit. You'll see the trace appear in the dashboard under Traces within a few seconds.

Step 3: Create a reliability campaign

A campaign defines the fault conditions you want to replay against your trace. Create one from the Campaigns page in the dashboard, or via the API:

bash

curl -sS -X POST https://api.sepurux.dev/v1/campaigns \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $SEPURUX_API_KEY" \
  -H "X-Project-Id: $SEPURUX_PROJECT_ID" \
  -d '{
    "name": "Support Triage — Core Reliability",
    "mutation_pack_id": "sepurux.core.reliability",
    "eval_set": { "checks": ["status"] }
  }'

The core reliability pack includes 10+ mutation types covering schema renames, timeouts, value corruptions, and forced errors — the most common failure conditions in production tool-using agents.

Step 4: Queue a run

Queue a reliability run using your trace ID and campaign ID. The worker replays the trace multiple times, applying a different mutation to each attempt:

bash

curl -sS -X POST https://api.sepurux.dev/v1/runs \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $SEPURUX_API_KEY" \
  -H "X-Project-Id: $SEPURUX_PROJECT_ID" \
  -d '{
    "trace_id": "<your-trace-id>",
    "campaign_id": "<your-campaign-id>"
  }'

Or use the SDK's assert helper to queue a run and block until it completes — useful in CI:

python

from sepurux import SepuruxClient

client = SepuruxClient.from_env()
client.assert_run(
    trace_id="<your-trace-id>",
    campaign_id="<your-campaign-id>",
    min_pass_rate=0.85,
    max_unsafe=0,
)
# Exits non-zero if the run fails thresholds

Step 5: Inspect results

Open the run in the dashboard to see the full breakdown: pass rate, failed attempts, first-failure tool attribution, mutation trace, and recommended fixes derived from the failure pattern.

Pass rate: what percentage of replay attempts succeeded under mutation
Failed attempts: which specific mutations caused failures, and why
Tool fragility: which tools your agent depends on most critically
Policy events: any tool calls that violated configured safety policies
Recommended fixes: high-confidence suggestions derived from the failure pattern

What to do with the results

A first run typically surfaces 2–4 fragile points. Common findings: an agent that doesn't handle renamed status fields, a payment flow that doesn't retry on timeout, a tool call that proceeds without checking for an expected field.

Fix the fragilities in your agent, add the run assertion to your CI pipeline, and set a pass rate threshold that reflects your risk tolerance. Future deployments that regress the agent's behavior will fail the gate before they reach production.

The full workflow — instrument, trace, campaign, run, assert — takes under 15 minutes end to end the first time. After that, every commit is tested against the same fault conditions automatically.

Instrument Your OpenAI Agent in 5 Minutes

Prerequisites

Step 1: Install the SDK

Step 2: Wrap your agent with instrumentation

Step 3: Create a reliability campaign

Step 4: Queue a run

Step 5: Inspect results

What to do with the results