Evidence
The rule this site is built on: no claim without a receipt. Every factual statement about the product appears below with the exact command that reproduces it and the result we observed. Anything we can't prove yet is framed as design intent elsewhere on this site — never as achievement.
The dataset, honestly
Tilltrend is pre-launch, so there is no merchant data. The demo warehouse is built from a public training dataset — CRM/ERP extracts of bicycle-and-accessories sales, order dates 2010-12-29 to 2014-01-28: 60,398 sales lines, 27,659 orders, 18,484 customers, 295 products, total revenue 29,356,250. Every number on this site describes that dataset, not any real business. The dataset's own defects (19 invalid-date sales lines carrying 4,992 of revenue; 7 products with a missing category) are documented in the governed metric definitions and allow-listed in the quality checks at exactly their documented magnitude — they fail the build if they drift.
All commands below run from the repository root against the
built demo warehouse (PostgreSQL; python app\warehouse\build.py
builds it from the seeded CSVs in about two seconds). Observation dates are
when we last ran each command.
Claims and proofs
Claim 1 — Every number in an answer is receipted, and the receipt replays without the model
- Reproduce
python app\scripts\ask.py "What was total revenue?" python app\scripts\replay.py rcp_c60c017f43c346328383448a66088735- What you should observe
- The answer's RECEIPTED FACTS section shows
total_revenue = 29,356,250with a receipt id and hash. Replaying that receipt re-executes the same SQL with no model in the loop and grades it. (Your own run produces its own receipt id; replay whatever id your run prints.) - Observed
- 2026-06-13: verdict MATCH — checks
receipt_hash=ok, internal_result_hash=ok, reexecution_hash=ok, row_count=ok, derived=ok. Post-check on the model prose: PASS, attempts=1.
Claim 2 — A doctored receipt fails replay
- Reproduce
The fault-injection suite (python app\scripts\run_c8_gate.pyapp\tests\test_c8_fault_injection.py) forges receipts five ways: edited rows; edited rows with the result hash recomputed to match; a fully coherent forgery with every hash recomputed; doctored derived values under a coherent hash; deleted caveats.- What you should observe
- Each forgery fails with exactly its predicted signature — e.g. the fully
coherent forgery is caught by re-execution (
reexecution_hash): the lie is internally consistent but the warehouse disagrees. Malformed receipt files grade ERROR instead of crashing the grader. - Observed
- 2026-06-13: PASS — all five doctoring layers
caught, no false MATCH (9 fault-injection tests green). Earlier
end-to-end demo: a doctored receipt copy with coherently recomputed hashes
→ FAIL on
reexecution_hash.
Claim 3 — The adversarial verification gate is green: 6/6 suites
- Reproduce
python app\scripts\run_c8_gate.py- What you should observe
- Six suites against the live warehouse: interception (the SQL and rows in every receipt are, by object identity, the ones the query layer executed — across the full metric registry), fault injection (claim 2), renderer (stored = shown; mutating results in memory after the call changes nothing in the rendered answer), build-identity verdicts (including a real warehouse rebuild mid-suite), the 33 warehouse quality checks, and the no-placeholder guard. Exit code non-zero on any failure.
- Observed
- 2026-06-13: GATE: all 6 suites passed
(c8_interception 16.8s, c8_fault_injection 3.7s, c8_renderer 1.7s,
c1a_verdicts 4.6s, quality_checks 1.0s, no_placeholders 0.1s). The gate has
failed honestly before: its first run filed 9 defects and the gate verdict
was FAIL until the one MAJOR was fixed and re-run — the full defect log is
public in
app\docs\evidence.md.
Claim 4 — The warehouse passes 33/33 data-quality checks
- Reproduce
python app\scripts\run_quality_checks.py- What you should observe
- 33 checks over all three layers: primary-key duplicates and NULLs, standardized-value audits, date validity and ordering, sales = quantity × price, surrogate-key uniqueness, fact→dimension referential integrity, and join fan-out guards. Known source gaps are allow-listed at their documented magnitude and fail if they drift.
- Observed
- 2026-06-13: 33/33 checks passed. This
suite has also failed honestly: it shipped at 32/33 with the failing check
left red on purpose until the underlying defect (implausible birthdates,
DQ-1) was cleansed — filed, fixed, and documented in
app\docs\evidence.md.
Claim 5 — Rebuilds are reproducible: same data, same fingerprint
- Reproduce
python app\warehouse\build.py python app\warehouse\build.py- What you should observe
- Each build records a build id, per-layer row counts, total revenue, and
a deterministic content fingerprint in
meta.build_info. Two consecutive builds of the same inputs get distinct build ids but the identical fingerprint — the warehouse is a pure function of its inputs. The fingerprint value itself changes only when the schema or data changes (by design: it hashes every relation), so it is the property that is the claim, not any one hex value. - Observed
- On the current build the fingerprint is
37d27fd64a4fa1e2ed420fe9e77620d1, identical across consecutive builds and re-verified by every gate run (the gate rebuilds the warehouse mid-suite and asserts the fingerprint is unchanged while the build id flips). It changed from the earlier3e02a4b72bbd66a222d0ff9dc808e7fbon 2026-06-13 when the canonical connector seam added three silver tables — the audited business numbers were unchanged; the fingerprint moved because new relations were hashed. That traceable change is the mechanism working, not a regression.
Claim 6 — The DuckDB → PostgreSQL port is number-for-number equivalent
- Reproduce
Full methodology and the audited DuckDB baseline figures:python app\warehouse\build.py python app\scripts\run_quality_checks.pyapp\docs\evidence.mdandapp\CHANGELOG.md(Week 1, engine port entry). The frozen DuckDB baseline file is kept on disk untouched.- What you should observe
- Every number exact across engines: total revenue 29,356,250 over 27,659 orders; bronze/silver/gold row counts (60,398 fact rows, 18,484 customers, 295 products); repeat split 62.86% / 37.14% with 77.02% of revenue from repeat customers; LTV avg 1,588.10 / median 272.00 / p90 4,825.70 / max 13,294; spot cells like 2013-07 revenue = 1,371,595 and cohort 2013-01 month-1 retention = 5.23%.
- Observed
- 2026-06-12: every compared figure identical between the audited DuckDB baseline and the PostgreSQL warehouse; quality suite 33/33 on the new engine.
Claim 7 — Every headline metric was independently validated: 10/10 confirmed
- Reproduce
- The validation queries are pasted verbatim in
app\docs\evidence.md— written against the silver tables with natural keys, deliberately not reusing the gold views' SQL, so agreement is a cross-check rather than a tautology. Re-run any of them against the built warehouse. - What you should observe
- Full-table diffs return 0 disagreeing rows (all 38 monthly revenue rows; all 430 cohort cells), and raw-row traces reconcile to the source — e.g. the top customer's 13 raw lines sum to their 13,294 LTV, line by line; the top product's 620 raw rows sum to 1,373,454 in bronze, pre-cleansing.
- Observed
- 2026-06-12: 10/10 claims CONFIRMED by the adversarial validation pass (mandate: "assume the warehouse is wrong until proven otherwise"). The same pass filed a real defect it found along the way — see claim 4.
Claim 8 — When the data doesn't exist, the agent refuses in governed text
- Reproduce
python app\scripts\ask.py "Show me the conversion funnel"- What you should observe
- No receipted facts, no approximation, no made-up funnel. A REFUSAL
section quoting the governed "Not yet supported" text from
app\docs\metrics.md: order data alone cannot recover pre-purchase steps, and no proxy is acceptable. The refusal is written to the audit log. - Observed
- 2026-06-13: governed refusal, verbatim — "Not yet supported (session_funnel), per app/docs/metrics.md: visits → product views → add-to-cart → checkout → purchase conversion. Requires an event-stream connector (storefront analytics)." (Full output on the product page.)
Claim 9 — Every number in the admin UI traces to a governed metric
- Reproduce
python app\scripts\smoke_admin.py- What you should observe
- 31 deterministic checks: all five admin pages render with the real governed values (total revenue 29,356,250; 2013-07 = 1,371,595; cohort 2013-01 retention 5.23%; the live warehouse fingerprint), the deterministic Ask path returns receipted facts with a receipt id, and the JSON API stays unbroken.
- Observed
- 2026-06-13: SMOKE_ADMIN PASSED — all checks green.
What we don't claim
No customers, no testimonials, no revenue-lift numbers, no "stores created" counter — Tilltrend is pre-launch and has none of these. No funnel or promo analytics — the connectors that would make them honest don't exist yet (claim 8 shows the product saying so itself). When any of this changes, the claim lands here first, with its command.
Found a claim on this site without a receipt? That's a bug. Report it and we'll fix it or retract the claim.