Sepurux logo
All posts
Tutorial7 min read

Instrument Your OpenAI Agent in 5 Minutes

A step-by-step walkthrough: add Sepurux to an OpenAI agent, record a trace, run a reliability campaign, and see results in the dashboard.

This tutorial walks through the full Sepurux workflow for an OpenAI agent: instrument, record, replay, and inspect. It takes about five minutes if you already have an OpenAI agent running.

Prerequisites

  • A Sepurux account and project (sign up free at app.sepurux.dev)
  • Your SEPURUX_API_KEY and SEPURUX_PROJECT_ID from the dashboard
  • An OpenAI agent you want to test — any Python script using openai.chat.completions.create works

Step 1: Install the SDK

bash
pip install sepurux

Step 2: Wrap your agent with instrumentation

Sepurux provides an OpenAI wrapper that intercepts model calls and records them as structured trace events. Replace your openai.OpenAI() client with the instrumented version:

python
import os
import openai
from sepurux import SepuruxClient
from sepurux.integrations.openai import instrument_openai

client = SepuruxClient.from_env()

# Your agent logic — unchanged except for the instrumented client
with client.trace("support_triage", {"ticket_id": "t-1042"}) as trace:
    ai = instrument_openai(openai.OpenAI(), recorder=trace)

    response = ai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a support triage agent."},
            {"role": "user", "content": "Ticket t-1042: customer reports payment declined."},
        ],
    )
    print(response.choices[0].message.content)

print("Trace ID:", trace.trace_id)

Set your credentials via environment variables or pass them directly:

bash
export SEPURUX_API_KEY=your-api-key
export SEPURUX_PROJECT_ID=your-project-id

Run your agent once. Sepurux records the trace — model calls, tool calls, arguments, and results — and uploads it on context exit. You'll see the trace appear in the dashboard under Traces within a few seconds.

Step 3: Create a reliability campaign

A campaign defines the fault conditions you want to replay against your trace. Create one from the Campaigns page in the dashboard, or via the API:

bash
curl -sS -X POST https://api.sepurux.dev/v1/campaigns \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $SEPURUX_API_KEY" \
  -H "X-Project-Id: $SEPURUX_PROJECT_ID" \
  -d '{
    "name": "Support Triage — Core Reliability",
    "mutation_pack_id": "sepurux.core.reliability",
    "eval_set": { "checks": ["status"] }
  }'

The core reliability pack includes 10+ mutation types covering schema renames, timeouts, value corruptions, and forced errors — the most common failure conditions in production tool-using agents.

Step 4: Queue a run

Queue a reliability run using your trace ID and campaign ID. The worker replays the trace multiple times, applying a different mutation to each attempt:

bash
curl -sS -X POST https://api.sepurux.dev/v1/runs \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $SEPURUX_API_KEY" \
  -H "X-Project-Id: $SEPURUX_PROJECT_ID" \
  -d '{
    "trace_id": "<your-trace-id>",
    "campaign_id": "<your-campaign-id>"
  }'

Or use the SDK's assert helper to queue a run and block until it completes — useful in CI:

python
from sepurux import SepuruxClient

client = SepuruxClient.from_env()
client.assert_run(
    trace_id="<your-trace-id>",
    campaign_id="<your-campaign-id>",
    min_pass_rate=0.85,
    max_unsafe=0,
)
# Exits non-zero if the run fails thresholds

Step 5: Inspect results

Open the run in the dashboard to see the full breakdown: pass rate, failed attempts, first-failure tool attribution, mutation trace, and recommended fixes derived from the failure pattern.

  • Pass rate: what percentage of replay attempts succeeded under mutation
  • Failed attempts: which specific mutations caused failures, and why
  • Tool fragility: which tools your agent depends on most critically
  • Policy events: any tool calls that violated configured safety policies
  • Recommended fixes: high-confidence suggestions derived from the failure pattern

What to do with the results

A first run typically surfaces 2–4 fragile points. Common findings: an agent that doesn't handle renamed status fields, a payment flow that doesn't retry on timeout, a tool call that proceeds without checking for an expected field.

Fix the fragilities in your agent, add the run assertion to your CI pipeline, and set a pass rate threshold that reflects your risk tolerance. Future deployments that regress the agent's behavior will fail the gate before they reach production.

The full workflow — instrument, trace, campaign, run, assert — takes under 15 minutes end to end the first time. After that, every commit is tested against the same fault conditions automatically.

Ready to test your agent?

Instrument in minutes, replay under fault injection, and block regressions before production.