Hooktopus

Docs

dbt integration.

The dbt model generator turns observed payload shapes into typed staging models. Here's how it works, how to regenerate without losing edits, and how to make it part of your CI.

How generation works

A schema-observer Worker walks each sampled webhook payload (default 1 in 10) and records, for every JSON path it sees: the observed types, total counts, null counts, sample values, first-seen, and last-seen timestamps. The observations live in D1 in the schema_observations table.

When you hit Generate in the hub, we:

  1. Filter to fields seen in ≥X% of events (default 1%)
  2. Walk the tree and assign a SQL type per path based on observed types
  3. Render stg_<endpoint>.sql and _hooktopus__sources.yml
  4. Bundle them into a ZIP and stream it back

Type extraction rules

  • Always numeric → SAFE_CAST(JSON_VALUE(...) AS INT64) or FLOAT64
  • Always string → JSON_VALUE(...)
  • Mixed types → JSON_VALUE(...) with explanatory comment
  • Object → JSON_QUERY(...) (returns the sub-tree)
  • Array → JSON_QUERY(...) (you unnest in downstream models)

Where to put the files

The ZIP contains exactly two files. Drop them into your dbt project at:

bash
my-dbt-project/
└── models/
    └── staging/
        └── hooktopus/
            ├── stg_stripe.sql
            └── _hooktopus__sources.yml

What the source file looks like

Hooktopus generates a sources declaration with a freshness check so dbt warns you when events stop arriving. Replace your-project and your-dataset as needed — the generator pre-fills the destination you picked.

models/staging/hooktopus/_hooktopus__sources.yml
version: 2

sources:
  - name: hooktopus
    database: your-project
    schema: your_dataset
    tables:
      - name: events
        identifier: events
        loaded_at_field: received_at
        freshness:
          warn_after: { count: 1, period: hour }
          error_after: { count: 6, period: hour }
        columns:
          - name: event_id
            tests:
              - not_null
              - unique

Regenerating without losing your edits

The hard problem: you edited stg_stripe.sql to add a derived column, and now Stripe added a field and you want to regenerate. Two patterns work:

Pattern A — Override layer (recommended)

Keep stg_stripe.sql regen-clean. Put your custom columns in a layer above it, like stg_stripe_enriched.sql. Regenerate freely; your derived layer is untouched.

models/staging/hooktopus/stg_stripe_enriched.sql
{{ config(materialized='view') }}

select
  *,
  amount_cents / 100.0 as amount_dollars,
  case when event_type like 'charge.%' then 'charge' else 'other' end as category
from {{ ref('stg_stripe') }}

Pattern B — Pin and own

Stop regenerating after the first version. Edit stg_stripe.sql freely. Hooky still detects drift and alerts you; you handle the SQL changes manually. Best when your team has strong dbt opinions you don't want overridden.

We never auto-merge regenerated SQL into your repo. v1.1's GitHub App opens a PR — you review, test, and merge it through your normal review process. No surprises.

Wiring into CI

The freshness check we generate fires a dbt source freshness warning if no events arrived in the last hour. Add this to your CI to catch silent endpoint failures:

bash
# In your scheduled CI run, before `dbt run`:
dbt source freshness --select source:hooktopus

Testing the generated model

bash
# Confirm it builds at all
dbt run --select stg_stripe

# Confirm the freshness + tests pass
dbt test --select stg_stripe

# Inspect typed columns
dbt show --select stg_stripe --limit 10