Hooktopus

Docs

Replay from R2.

Every event is archived in Cloudflare R2 before we acknowledge ingestion. That archive is the safety net — replay lets you re-write any time range to any destination.

When you'd use replay

  • BigQuery outage — events kept arriving in R2 but writes failed
  • You rotated your SA credentials and want to backfill the gap
  • You added a new destination and want to backfill it from R2
  • You're debugging dbt against a known time range and want repeatable input

How it works

The hub UI at /w/[slug]/replay lets you pick:

  • Endpoint (or all)
  • Time range
  • Optional JSON-path predicate (e.g. $.type == "charge.succeeded")
  • Target destination (must be Active)

We list the matching R2 objects by prefix, parse the canonical event JSON, and re-publish through the destination-writer queue. The same dedupe rules apply — event_id is the unique key in BQ, so a re-played event lands as an upsert, not a duplicate.

Storage layout

R2 objects are keyed by workspace, hour-of-day, endpoint, and event ID:

bash
<workspace_id>/<YYYY>/<MM>/<DD>/<HH>/<endpoint_name>/<event_id>.json

This means a time-range replay is a cheap R2 list operation, not a full scan. A one-hour range pulls one prefix per endpoint; a one-day range pulls 24. R2 list calls cost roughly $0.36 per million — we don't bill them through.

Replay via API

The hub UI calls the same endpoint you can hit directly:

http
POST /v1/replay HTTP/1.1
Host: api.hooktopus.io
Authorization: Bearer hk_live_••••
Content-Type: application/json

{
  "endpoint_id": "ep_stripe_x",
  "from": "2026-05-10T00:00:00Z",
  "to":   "2026-05-11T00:00:00Z",
  "destination_id": "dst_bq_prod",
  "filter": "$.type == \"charge.succeeded\""
}

# Response
{ "ok": true, "job_id": "rj_01970000-...", "estimated_events": 4231 }

Tracking a replay job

http
GET /v1/replay/rj_01970000-... HTTP/1.1
Authorization: Bearer hk_live_••••

# Response
{
  "status": "running",         // or "queued" | "completed" | "failed"
  "events_total": 4231,
  "events_written": 3104,
  "started_at": "2026-05-16T20:14:30Z"
}

Retention

R2 archive defaults to 30 days. You can configure 1d / 7d / 30d / 90d / never in workspace settings. Replay can only operate on what's still in R2 — if you set a 7-day retention and try to replay 14 days, you'll get an empty-range error.

Business and Scale plans get longer default retention. If you have a regulatory requirement for long-term raw event storage, "never" plus a tier with the retention budget is fine.

What replay doesn't do

  • Won't re-trigger source-side webhooks. Replay is an internal re-write from R2 — we never call out to Stripe et al.
  • Won't transform. The event JSON we archive is what gets re-written. If you need a transformed re-write, build it as a dbt model.
  • Won't run concurrent replays per destination. One at a time per destination to keep BQ rate limits happy.