How Tilltrend works
An AI analyst is only useful if you can trust the numbers. Tilltrend's design starts from the assumption that you shouldn't have to trust — you should be able to check.
The Ask flow
You type a question — "What was total revenue?", "Compare June and July 2013 revenue", "Which countries drive revenue?". What happens next is deliberately split in two:
- The model chooses, it never computes. The AI proposes a tool call against a closed set of governed metrics (revenue, orders, AOV, monthly revenue, cohort retention, customer LTV, product performance, repeat rate). Invalid proposals are rejected and audited; the model gets one chance to correct itself, then the system refuses deterministically.
- The deterministic layer computes. The metric's canonical SQL — written down once, in a governed definitions document — runs against the warehouse over a read-only connection. The result is written into a receipt: the SQL, the parameters, the rows, the warehouse build identity, and content hashes over all of it.
The answer you see has two visibly separate sections. RECEIPTED FACTS is rendered by code directly from the stored receipts — the model cannot edit it. INTERPRETATION is model prose, labeled "model, not receipted", and a post-check verifies every number in it appears in the receipted facts. If the model invents a number, the draft is rejected and regenerated; two strikes and a deterministic fallback ships instead. This has caught a real violation in testing — the bad draft never shipped.
Receipts and replay, simply
Every receipted fact carries a receipt id and a hash, like
rcp_c60c017f43c346328383448a66088735 | sha256:371834e1a0a6100a.
Replaying a receipt re-executes its exact SQL against the warehouse —
no model anywhere in the loop — and grades the result:
| Verdict | Meaning |
|---|---|
| MATCH | Same SQL, same data, same numbers. The answer still holds. |
| STALE | The warehouse content changed since the receipt was written; the receipt was honest for the data it saw. |
| FAIL | The receipt does not survive re-execution — including receipts someone edited to lie coherently. |
| ERROR | The receipt file is malformed; the grader refuses to grade rather than guess. |
This is tested adversarially: the verification gate doctors receipts five different ways (edit the rows, recompute the hashes coherently, tamper with derived values, delete caveats) and asserts each forgery fails with exactly the predicted signature. Commands and observed results are on the evidence page.
Honest refusal
When the data to answer a question doesn't exist, Tilltrend says so — in governed text quoted from the metric documentation, not model improvisation. Verbatim, from a real run on 2026-06-13:
python app\scripts\ask.py "Show me the conversion funnel"RECEIPTED FACTS =============== (no receipts this turn) REFUSAL (governed text, not model prose) ======================================== Not yet supported (session_funnel), per app/docs/metrics.md: visits → product views → add-to-cart → checkout → purchase conversion. Requires an event-stream connector (storefront analytics). No proxy is acceptable: order data alone cannot recover pre-purchase steps. governed source: app/docs/metrics.md
No proxy metric, no plausible-sounding guess. Order data alone cannot recover pre-purchase funnel steps, so the product refuses rather than approximates.
The admin app today
Six server-rendered pages exist now. Every number on them is queried live at render time using the canonical SQL from the governed metric definitions, and each KPI is footnoted with its metric id.
| Page | What it shows |
|---|---|
| Overview | Headline KPIs (total revenue, orders, AOV, repeat-rate split, LTV distribution) plus a build-identity panel: which warehouse build produced these numbers, over what date window. |
| Trends | Sales over time at the grain you pick — day, week, month, quarter, or year — as a line/area chart with a running total, a trailing moving average, and period-over-period (and year-over-year) change. Built on governed time-series views (window functions over the fact table); a period with no sales reads as an honest 0, partial edge periods are flagged, and a change with no prior period says so rather than inventing one. |
| Products | Top products by revenue (server-side SVG bar chart) with a units / orders / distinct-buyers table and a category-mix breakdown, each figure from the governed product-performance metric. |
| Cohorts | The full cohort-retention matrix; sparse cells are labeled "zero, not missing", with the governed cohort definition quoted on the page. |
| Ask | A chat surface. Renders the exact two-section answer — receipted facts with receipt ids and hashes, interpretation visibly tagged as model prose — and, beneath every answer, the exact SQL the deterministic layer ran, a per-receipt replay command, and the post-check status. |
| Audit | The append-only audit log: every query, every tool call, every rejection and refusal, with the warehouse build id on each row. |
What deliberately doesn't exist: there are no Funnel, Experiments, or Insights pages — not even mock ones. The nav carries a single greyed, non-clickable note: "Funnel — requires event data". A page ships when the capability is real, not before.
Not supported yet
- Session funnel (visits → views → cart → checkout → purchase) — requires an event-stream connector. Refused honestly today, as shown above.
- Promo performance (discount uptake, promo ROI) — requires a promotions/discounts source; the current demo extract has none.
- Store connectors — the warehouse currently builds from CSV extracts. A Shopify connector is planned as the first integration; it does not exist yet, and we won't describe it as if it did.
Architecture, for technical readers
A real layered warehouse
Store data lands in a medallion warehouse on PostgreSQL: bronze (raw, as extracted), silver (cleansed — deduplication, invalid-date nulling, amount repairs, all rules documented), gold (a star schema plus analytics views: monthly revenue, cohort retention, customer LTV, product performance, repeat rate). Rebuilds are single-transaction and reproducible: every build records its identity and a deterministic content fingerprint, so two builds of the same data are provably identical. A 33-check quality suite guards each layer, and the whole warehouse was ported DuckDB → PostgreSQL with every metric verified exact across engines.
Governed metrics, one source of truth
Every metric the agent can cite is defined once — id, plain-language definition, exact formula, grain, lineage, canonical SQL, and real caveats (including the dataset's known gaps, like 19 invalid-date sales lines worth 4,992 of revenue, documented rather than hidden). Definitions change only via dated decision entries. If a question needs a formula that isn't governed, that's a new-definition request — not a judgment call the model gets to make.
The model never touches data
The law of the system: the model proposes tool calls and writes prose; everything else is deterministic code. SQL comes from the governed registry, runs over a connection that is read-only twice over (database-level read-only transactions plus an SQL gate allowing single SELECT statements only), with statement timeouts and row caps. Every call, rejection, and refusal is written to an append-only audit log. The receipts machinery itself is tested by an adversarial verification gate — interception tests prove the receipt contains the very objects the query executed, fault-injection proves forgeries fail, renderer tests prove what's stored is what's shown.
All of the above is checkable: every claim, with its command, on the evidence page.