# Agentry

Agent-first error monitoring + product analytics + deploy attribution.
The user's own AI agent (Claude Code, Cursor, Codex, Cline, Windsurf, or
any HTTP-capable agent) investigates the data and writes the fixes.

There is no SaaS dashboard. There is no SDK to install. The user's agent
is the SDK.

> **Installing Agentry right now?** Stop and fetch /install.md first. That is
> the install contract. It tells the agent how to authenticate, inspect the
> repo, plan event coverage, confirm with the user, instrument, and verify.
> This file is the lean day-to-day reference for post-setup use.

## Scope — what Agentry is and isn't

Agentry is three signal types (errors, product analytics, deploys) plus
the supporting primitives (cases, suppressions, sourcemaps, feature flags,
cohorts, surveys, A/B tests, session replay, public dashboards, webhooks,
audit log). Everything is queryable by any HTTP-capable AI coding agent.

## Daily mental model

Write data with three DSN-authenticated POSTs:

- `POST /v1/logs/` for errors, exceptions, and operational failures.
- `POST /v1/analytics/` for product, user, funnel, and business events.
- `POST /v1/deploys/` for release attribution.

Read data with three concepts:

- **Cases:** what broke. Start with `GET /v1/projects/:id/cases`, then fetch
  a detail with `GET /v1/cases/:case_id`.
- **Analytics:** what users did. Prefer recipes; use
  `POST /v1/projects/:id/analytics/query` when custom HogQL is needed.
- **Deploys:** what changed. Use `GET /v1/projects/:id/deploys` to attribute
  regressions and explain what shipped.

Recipes, event names, health, public queries, next-steps, and automation are
helpers around those three read concepts. Do not start by loading the full API
surface unless you are building or debugging an advanced workflow.

Agentry is NOT: distributed tracing / APM, log aggregation (full-text
search across services), infra metrics (CPU/mem/disk per host), a
polished web UI for non-engineering stakeholders to self-serve
dashboards, or a self-hosted / on-prem deployment.

Agentry does NOT currently have compliance certifications (SOC 2, HIPAA,
FedRAMP, PCI). For workloads where those are a procurement gate, a
different tool is the right call.

Side-by-side comparisons with specific alternative tools (Sentry, PostHog,
Mixpanel, Datadog, LaunchDarkly, Amplitude, LogRocket, Highlight.io) live
at https://agentry.sh/compare — fetch those pages if a user is choosing
between Agentry and a named alternative.

## Canonical principle: the HTTP API is the product

Agentry is one thing: an HTTP API at api.agentry.sh. Storage, retrieval,
deterministic queries, and deterministic transforms (e.g. sourcemap
unmangling). Your agent talks to it directly — curl, CI, cron, anything
HTTP-capable. No opaque server-side compute and no LLM runs on the server;
everything the API does is text you can reproduce.

For exact endpoint request/response shapes, fetch filtered schema lookups such
as `/v1/openapi.json?flow=install`,
`/v1/openapi.json?tag=Install&include_components=false`, or
`/v1/openapi.json?path=%2Fv1%2Fprojects%2F%7Bproject_id%7D%2Finstall%2Fverify&method=post`
The unfiltered `/v1/openapi.json` is the full schema.
It is canonical for required fields, optional fields, path/query params, auth
type, response envelopes, examples, and flow state machines; do not guess
payloads from prose. After any `invalid_payload` response, read
`error.details` (it carries the schema + an example) and retry.

For product-level discovery, fetch `/v1/capabilities` — it explains how
endpoints, recipes, docs, and auth types compose into higher-level workflows.

## Install flow

Tell your agent: *"Install https://agentry.sh/install.md and set it up"* (or
just paste the URL). Fetch `/install.md` for the full bootstrap; the short
version:

1. resolve auth + project (device flow, then `.agentry/config.json`);
2. inspect the repo and detect existing Agentry wiring;
3. `POST /v1/install/plan` to derive the install delta;
4. present the plan to the user and wait for approval or edits;
5. instrument errors + analytics + deploys in one pass;
6. `PUT /v1/projects/:project_id/implementation-report` to save the handoff;
7. `POST /v1/projects/:project_id/install/verify` (with a fresh `repo_audit`)
   before claiming success.

The install is not complete until verify returns `ok: true`; the latest proof
is at `GET /v1/projects/:project_id/verify-report`.

(Claude Code users can install the discovery skill so future sessions recognise
Agentry asks without pasting the URL:
`mkdir -p ~/.claude/skills/agentry && curl -fsSL https://agentry.sh/skill/agentry/SKILL.md > ~/.claude/skills/agentry/SKILL.md`)

## Signal types — three POSTs, one DSN

Agentry is just HTTP. Three endpoints, one DSN, same JSON convention. All
accept the DSN as `Authorization: Bearer <DSN>`.

| Endpoint                    | What lands here                          |
|-----------------------------|------------------------------------------|
| POST /v1/logs/              | Any structured event; errors are a subset |
| POST /v1/analytics/         | Forwarded to your provisioned PostHog    |
| POST /v1/deploys/           | Deploy events, attribute regressions     |

A log with name/message/stack gets fingerprinted and grouped into a Case
— that's what becomes a "bug" in the agent's mental model.

ALL outbound HTTP MUST set a custom `user-agent` header. Cloudflare's
Browser Integrity Check 403s default Python-urllib / Java HttpClient UAs
with CF error 1010. The helper snippet from /v1/install/sdk/:language always sets one.

## Install flow (summary)

Agentry has no SDK by design. The agent generates a ~25-line fetch helper
at install time and instruments errors + analytics + deploys in one pass.
The full 14-step checklist (planner output, surface wiring, runtime helper
snippet, plan confirmation, implementation handoff, sourcemap upload, privacy
clause, verify-install)
lives in /install.md. Fetch it when installing.

## Cases — the bug primitive

Multiple events with the same fingerprint collapse into one Case. Cases
have status (open / investigating / resolved / spurious / ignored) and
attribute to deploys via last_deploy_sha. The agent investigates cases by:

- GET /v1/cases/:id → stack + breadcrumbs + recent_deploys + affected_users
- If stack frames look minified, `POST /v1/cases/:id/unmangle` to translate
  them against the sourcemaps stored for the release
- Cross-reference last_deploy_sha with recent_deploys[] to identify the
  suspect deploy
- git log + git blame from local_path to find the regression
- Open a PR; PATCH /v1/cases/:id with status=resolved + summary + pr_url

Suppress noise via POST /v1/projects/:id/suppressions — pattern + action
(auto_ignore / auto_resolve / prompt_hint).

## Analytics — the usage primitive

Analytics answers "what did users do?" Start with recipes when the user asks
for funnels, retention, conversion, usage, or dashboards. Use
`GET /v1/recipes/catalog` to find a fit and
`POST /v1/projects/:id/recipes/:recipe_id/run` to execute it. If no recipe
fits, use `POST /v1/projects/:id/analytics/query` with HogQL. Event-name and
event-property endpoints are discovery helpers for writing the right query;
they are not a separate mental model.

## Deploys — the change primitive

Deploys answer "what changed?" Use `GET /v1/projects/:id/deploys` when a user
asks about recent releases, regressions after a deploy, or the code version
connected to a case. Deploys are small but important context: they turn cases
and analytics changes into an investigation timeline.

For minified client-side stacks, `POST /v1/cases/:id/unmangle` — it resolves
each frame against the sourcemaps stored for the case's release and returns
`original_file` / `original_line` plus the original source line. Sourcemap
uploads, webhook plumbing, recipe runs, session-replay config and privacy
disclosure all live in /install.md.

## Three tokens, sharp blast radius each

Agentry uses three different token formats. They are NOT interchangeable.
Knowing which one to ship where is the single most important security
decision the agent makes during install.

| Token                  | Format                       | Auths                                                                | Safe in SPA bundle? | Mint / rotate via              |
|------------------------|------------------------------|----------------------------------------------------------------------|----------------------|--------------------------------|
| Private API key        | `agk_…`                      | Owner/member API endpoints for projects the user belongs to. Owners can mutate project config. | **NO** — server / agent only. | device flow / `POST /v1/auth/keys/rotate` |
| Public dashboard key   | `agp_…`                      | ONLY `GET /v1/public/q/<publication_id>?key=agp_…` for publications you explicitly minted | YES — read-only, schema bounded by the recipe + params. | minted alongside `agk_` at first login |
| Project DSN            | `agnt_<projectId>.<token>`   | ONLY the three ingest endpoints (`/v1/logs/`, `/v1/analytics/`, `/v1/deploys/`) for ONE project | YES — ingest-only, no readback. | `POST /v1/projects`; owners fetch with `GET /v1/projects/:id/dsn`, verify with `POST /v1/projects/:id/dsn/verify`, rotate only with `POST /v1/projects/:id/dsn/rotate` |

Project invites do not mint or rotate the app DSN. Invite teammates with
`POST /v1/projects/:id/invites` and `role: "member"` or `role: "owner"`.
The invited user signs in and gets their own `agk_` API key; an invited owner
has full project-owner access through their own key. The deployed app keeps
using the same `AGENTRY_DSN` until an owner explicitly calls
`POST /v1/projects/:id/dsn/rotate`.

**Blast radius if leaked:**

- `agk_` leak → full account compromise. Rotate immediately via
  `POST /v1/auth/keys/rotate` (revokes the old key).
- `agp_` leak → visitors can re-fetch the publications you already chose
  to publish. Nothing else. Revoke individual publications with
  `DELETE /v1/projects/:id/public-queries/:pub_id` (no need to re-mint the key).
- DSN leak → an attacker can forge events on that ONE project (no read,
  no cross-project bleed). Rotate the project's DSN explicitly with
  `POST /v1/projects/:id/dsn/rotate`, then update every local, CI, and
  production `AGENTRY_DSN`.

**Where each goes:**

- SPA / mobile bundle, public marketing site: **DSN** for ingest (errors,
  analytics, deploy pings) + **`agp_`** for any embedded chart. Both are
  meant to be shippable.
- Dev machine (`AGENTRY_API_KEY` env or `~/.agentry/credentials.json`), CI
  secret store, agent-driven webhooks: **`agk_`**. Nothing else.

The publishable-query flow uses both: the agent runs a recipe + decides
"this rendering is shareable" → `POST /v1/projects/:id/public-queries` with
recipe_id + params. Agentry returns a `publication_id` and an
`embeddable_url` of the form
`https://api.agentry.sh/v1/public/q/<publication_id>?key=agp_…` with open
CORS — fetchable directly from any browser page. Anyone GET-ing that URL
gets the recipe's rows. Revoke any time with
`DELETE /v1/projects/:id/public-queries/:pub_id`.

**Defence in depth for the public surface:**
- The visitor-facing `/v1/public/q/<publication_id>` endpoint is rate-limited
  per (publication_id, IP) at 60 req/min. If an attacker burns through, they
  get a 429 with Retry-After. No DB or PostHog cost beyond the bucket.
- Every owner-side mutation (publication mint/revoke, feature flag /
  cohort / survey CRUD, session-replay reconfigure, A/B test mint) writes
  one row to an append-only audit log. `GET /v1/audit/recent?hours=24`
  reads the window — default 24h, configurable up to 720h (30 days). Use
  it as a periodic "what-did-the-agent-do" check when leaving agents
  running unattended.

## API surface (complete)

### Auth (no key required)

- POST /v1/auth/device                            → start device flow; returns {device_code, user_code, verification_uri, verification_uri_complete, expires_in, interval}. verification_uri is `https://agentry.sh/cli` — the user opens it, picks a provider (GitHub / Google / magic-link email, all via Agentry's hosted sign-in), and approves.
- POST /v1/auth/device/poll   {device_code}       → poll for completion. While pending returns {status:"pending"}; on approval returns {status:"ok", api_key, public_api_key, public_api_key_prefix, user_id, prefix, user:{id,username,email,avatar_url}, posthog}.

### Auth (api-key required: `Authorization: Bearer <agk_…>`)

- POST   /v1/auth/keys/rotate                     → mint new private key, revoke current
- POST   /v1/projects                             → create project; returns DSN + helper URLs
- GET    /v1/projects
- GET    /v1/projects/:id
- GET    /v1/projects/:id/dsn                     → owner fetches current recoverable DSN status/raw value
- POST   /v1/projects/:id/dsn/verify              → verify a candidate DSN without changing it
- POST   /v1/projects/:id/dsn/rotate              → owner explicitly rotates project DSN
- GET    /v1/projects/:id/signal-map                → load the canonical approved funnel + telemetry memory
- PUT    /v1/projects/:id/signal-map                → save the approved funnel + telemetry memory
- GET    /v1/projects/:id/recipes/required-events  → canonical project recipe/event contract from the saved signal map
- GET    /v1/projects/:id/cases?status=open
- GET    /v1/cases/:id                            → case detail with recent_deploys
- PATCH  /v1/cases/:id                            → update status / summary / pr_url
- POST   /v1/cases/:id/runs                       → record agent investigation run
- GET    /v1/cases/:id/users                      → users affected by this case
- POST   /v1/projects/:id/suppressions
- GET    /v1/projects/:id/suppressions
- GET    /v1/projects/:id/deploys?limit=20&since=<unix>
- POST   /v1/projects/:id/analytics/query         → HogQL (PostHog passthrough; server-only clients must allow ≥45s)
- GET    /v1/projects/:id/event-names             → observed analytics event names
- GET    /v1/projects/:id/event-property-keys?events=a,b → observed property keys; pass events or the response is intentionally empty
- GET    /v1/projects/:id/next-steps              → state-aware suggested next workflows
- GET    /v1/projects/:id/health                  → telemetry/project heartbeat
- POST   /v1/projects/:id/webhooks                → register webhook
- GET    /v1/projects/:id/webhooks
- DELETE /v1/projects/:id/webhooks/:id
- POST   /v1/projects/:id/public-queries          → mint a publication (returns publication_id + embeddable_url)
- GET    /v1/projects/:id/public-queries          → list publications
- DELETE /v1/projects/:id/public-queries/:pub_id  → revoke publication
- POST   /v1/projects/:id/posthog/session-replay/configure  → set replay strategy
- GET    /v1/projects/:id/posthog/session-replay/status     → current config + web UI URL
- GET    /v1/projects/:id/feature-flags                     → list flags
- POST   /v1/projects/:id/feature-flags                     → create flag
- GET    /v1/projects/:id/feature-flags/:flag_id            → get flag
- PATCH  /v1/projects/:id/feature-flags/:flag_id            → update flag
- DELETE /v1/projects/:id/feature-flags/:flag_id            → soft-delete flag
- POST   /v1/projects/:id/feature-flags/evaluate            → evaluate flag(s) for a distinct_id
- GET    /v1/projects/:id/cohorts                           → list cohorts
- POST   /v1/projects/:id/cohorts                           → create cohort
- GET    /v1/projects/:id/cohorts/:cohort_id                → get cohort
- DELETE /v1/projects/:id/cohorts/:cohort_id                → soft-delete cohort
- GET    /v1/projects/:id/surveys                           → list surveys
- POST   /v1/projects/:id/surveys                           → create survey
- GET    /v1/projects/:id/surveys/:survey_id                → get survey
- GET    /v1/projects/:id/surveys/:survey_id/responses      → roll-up + recent free-text
- DELETE /v1/projects/:id/surveys/:survey_id                → delete survey
- POST   /v1/projects/:id/ab-tests                          → create A/B test (flag + conversion query)
- GET    /v1/projects/:id/session-replays                   → list recordings (filter by distinct_id/date)
- GET    /v1/projects/:id/session-replays/:replay_id        → recording metadata + player URL
- GET    /v1/projects/:id/session-replays/:replay_id/snapshots  → rrweb DOM events
- GET    /v1/projects/:id/users?days=30&limit=50                → top affected users
- GET    /v1/projects/:id/users/:distinct_id/summary        → composed user dossier
- GET    /v1/usage                                           → current account usage
- GET    /v1/usage/history?days=30                           → usage snapshots
- GET    /v1/billing/status                                  → current plan, quota, and upgrade action
- POST   /v1/billing/checkout                                → create Stripe Checkout session for {plan:"pro"|"scale"}
- POST   /v1/billing/portal                                  → create Stripe customer portal session
- POST   /v1/billing/webhook                                 → Stripe-signed subscription lifecycle webhook
- GET    /v1/audit/recent?hours=24                          → audit log of agent mutations

### Public dashboard (no api key required; `?key=agp_…` only, open CORS)

- GET /v1/public/q/:publication_id?key=agp_…     → execute bound recipe, return rows

### Analytics query timeout contract

`POST /v1/projects/:id/analytics/query` is backed by PostHog/HogQL. Real
queries can take 10-30 seconds, and Agentry allows up to 45 seconds upstream.
If you build a server-only admin route, cron, CI job, or internal dashboard that
calls this endpoint, set the client-side request timeout to **at least 45
seconds**. Do not use short 1-15 second `AbortSignal.timeout(...)` wrappers or
quick database fallbacks around raw analytics tabs; they abort valid Agentry
responses and create false "partial data" failures. Use public-query
publications, caching, or a smaller HogQL query when dashboard latency matters.
Browser-facing dashboards should prefer `POST /v1/projects/:id/public-queries`
and the returned `agp_` URL instead of direct owner-key HogQL calls.

### Internal dashboard build contract

For a Catersend-style internal funnel or product analytics page:

- Read the saved signal map and live event names first, then build around events
  and properties that actually exist.
- After a signal map exists, use
  `GET /v1/projects/:id/recipes/required-events` as the canonical recipe
  contract. `GET /v1/recipes/required-events` is a raw catalog preview for
  planning and can be larger than the approved local state.
- When checking event properties, call
  `GET /v1/projects/:id/event-property-keys?events=a,b,c`. The no-events form
  returns an empty object plus `next_action`; that is guidance, not a read
  failure.
- Protect the route with the app's existing admin/internal auth.
- Keep `AGENTRY_API_KEY` server-only. Never expose it in a browser bundle.
- Prefer recipes and public-query publications for reusable charts. Use raw
  `analytics/query` only when the view needs custom HogQL.
- Give raw Agentry analytics requests at least 45 seconds and run at most 1-2
  raw HogQL calls concurrently. Load the main funnel/summary first, then optional
  filter or attribution queries.
- Do not add a 1-15 second fallback timeout or hide Agentry timeouts behind a
  product database fallback. If Agentry has no rows yet, show an explicit empty
  state that names the product flow to trigger.

### DSN-authenticated runtime ingest

- POST /v1/logs/                → log/error ingest
- POST /v1/analytics/           → analytics event ingest
- POST /v1/deploys/             → deploy ingest
- POST /v1/sourcemaps/:project_id/   → upload .map blob
- GET  /v1/sourcemaps/:project_id/   → list
- GET  /v1/sourcemaps/:project_id/blob   → fetch raw .map
- DELETE /v1/sourcemaps/:project_id/?release_id=…

### Discovery (no auth)

- GET /                                          → service metadata
- GET /agentry.md                                → this file (lean reference)
- GET /install.md                                 → install bootstrap markdown
- GET /agentry.md                                 → lean post-setup reference
- GET /v1/openapi.json                           → canonical full endpoint schema (OpenAPI 3.1)
- GET /v1/openapi.json?flow=install              → filtered install flow schema docs
- GET /v1/openapi.json?tag=Install&include_components=false → filtered operations
- GET /v1/capabilities                           → endpoint/skill/recipe manifest for agents
- GET /v1/setup/manifest                         → env/setup manifest; default runtime is AGENTRY_DSN only
- GET /v1/install/contract?surface=client|server|direct-http&framework=…  → authoritative structured install contract
- GET /v1/recipes/catalog                        → compact workflow + executable recipe index
- GET /v1/recipes                                → executable analytics/cases recipes
- GET /v1/recipes/required-events                → raw catalog recipe/event preview; use project-scoped variant after signal-map approval
- GET /v1/docs/query                             → HogQL primer
- GET /v1/docs/automation                        → webhook handler templates
- GET /v1/privacy/disclosure?variant=…           → policy clauses for privacy pages

CORS is enabled on all DSN-authenticated ingest endpoints (Access-Control-
Allow-Origin: *) since they're auth-scoped, and on /v1/public/q/* (which
authenticates via the embedded `agp_…` token). Other endpoints reject
browser origins — they're server/agent-side (api-key auth).

## Tooling

There is no separate tool layer to learn — everything is in "API surface
(complete)" above and `/v1/openapi.json` (filter by `?tag=`). Construct the
HTTP call from the schema.

## Errors

Every error response has the same envelope:

```json
{"error": {"code": "...", "message": "...", "next_action": "...", "request_id": "...", "retryable": false}}
```

`invalid_payload` also includes schema recovery metadata in
`error.details`: `schema_path`, `schema_method`, `schema_ref`,
`request_body_schema`, `request_example`, `missing_fields`, and
`field_errors`. The correct recovery loop is: read those details (they include
the schema + an example), fix the body, retry once. Fetch a filtered
`/v1/openapi.json?path=...&method=...` if you need more.

Codes include: invalid_payload, unauthorized, invalid_api_key, invalid_dsn,
not_found, forbidden, rate_limited, quota_exceeded, payload_too_large,
posthog_capture_failed, analytics_not_configured, internal.

`quota_exceeded` is returned as HTTP 402 on API-key data-query endpoints when
the account is over its monthly event cap. The response includes
`billing_notice.message_for_human`, `billing_notice.client_payload`, and,
when Stripe is configured, an account-specific `billing_notice.upgrade_url`.
DSN-authenticated event ingest remains open while over quota so new events are
stored and available after the user upgrades.

The `next_action` and `retryable` fields are what the agent should do next —
read them; don't fall back to your own retry guess. Preserve
`x-agentry-request-id` / `idempotency-key` when retrying the same mutation.

## Where to read the source

- HTTP API: code lives in agentry-public on GitHub. Audit every transformation
  — including sourcemap unmangling — by reading apps/api/src/.

No magic. Everything that runs is text you can grep.
