Evidence

The rule this site is built on: no claim without a receipt. Every factual statement about the product appears below with the exact command that reproduces it and the result we observed. Anything we can't prove yet is framed as design intent elsewhere on this site — never as achievement.

The dataset, honestly

Tilltrend is pre-launch, so there is no merchant data. The demo warehouse is built from a public training dataset — CRM/ERP extracts of bicycle-and-accessories sales, order dates 2010-12-29 to 2014-01-28: 60,398 sales lines, 27,659 orders, 18,484 customers, 295 products, total revenue 29,356,250. Every number on this site describes that dataset, not any real business. The dataset's own defects (19 invalid-date sales lines carrying 4,992 of revenue; 7 products with a missing category) are documented in the governed metric definitions and allow-listed in the quality checks at exactly their documented magnitude — they fail the build if they drift.

All commands below run from the repository root against the built demo warehouse (PostgreSQL; python app\warehouse\build.py builds it from the seeded CSVs in about two seconds). Observation dates are when we last ran each command.

Claims and proofs

Claim 1 — Every number in an answer is receipted, and the receipt replays without the model

Reproduce

python app\scripts\ask.py "What was total revenue?"
python app\scripts\replay.py rcp_c60c017f43c346328383448a66088735

What you should observe

The answer's RECEIPTED FACTS section shows total_revenue = 29,356,250 with a receipt id and hash. Replaying that receipt re-executes the same SQL with no model in the loop and grades it. (Your own run produces its own receipt id; replay whatever id your run prints.)

Observed

2026-06-13: verdict MATCH — checks

receipt_hash=ok, internal_result_hash=ok, reexecution_hash=ok,
    row_count=ok, derived=ok

. Post-check on the model prose: PASS, attempts=1.

Claim 2 — A doctored receipt fails replay

Reproduce

python app\scripts\run_c8_gate.py

The fault-injection suite (app\tests\test_c8_fault_injection.py) forges receipts five ways: edited rows; edited rows with the result hash recomputed to match; a fully coherent forgery with every hash recomputed; doctored derived values under a coherent hash; deleted caveats.

What you should observe

Each forgery fails with exactly its predicted signature — e.g. the fully coherent forgery is caught by re-execution (reexecution_hash): the lie is internally consistent but the warehouse disagrees. Malformed receipt files grade ERROR instead of crashing the grader.

Observed

2026-06-13: PASS — all five doctoring layers caught, no false MATCH (9 fault-injection tests green). Earlier end-to-end demo: a doctored receipt copy with coherently recomputed hashes → FAIL on reexecution_hash.

Claim 3 — The adversarial verification gate is green: 6/6 suites

Reproduce

python app\scripts\run_c8_gate.py

What you should observe

Six suites against the live warehouse: interception (the SQL and rows in every receipt are, by object identity, the ones the query layer executed — across the full metric registry), fault injection (claim 2), renderer (stored = shown; mutating results in memory after the call changes nothing in the rendered answer), build-identity verdicts (including a real warehouse rebuild mid-suite), the 33 warehouse quality checks, and the no-placeholder guard. Exit code non-zero on any failure.

Observed

2026-06-13: GATE: all 6 suites passed (c8_interception 16.8s, c8_fault_injection 3.7s, c8_renderer 1.7s, c1a_verdicts 4.6s, quality_checks 1.0s, no_placeholders 0.1s). The gate has failed honestly before: its first run filed 9 defects and the gate verdict was FAIL until the one MAJOR was fixed and re-run — the full defect log is public in app\docs\evidence.md.

Claim 4 — The warehouse passes 33/33 data-quality checks

Reproduce

python app\scripts\run_quality_checks.py

What you should observe

33 checks over all three layers: primary-key duplicates and NULLs, standardized-value audits, date validity and ordering, sales = quantity × price, surrogate-key uniqueness, fact→dimension referential integrity, and join fan-out guards. Known source gaps are allow-listed at their documented magnitude and fail if they drift.

Observed

2026-06-13: 33/33 checks passed. This suite has also failed honestly: it shipped at 32/33 with the failing check left red on purpose until the underlying defect (implausible birthdates, DQ-1) was cleansed — filed, fixed, and documented in app\docs\evidence.md.

Claim 5 — Rebuilds are reproducible: same data, same fingerprint

Reproduce

python app\warehouse\build.py
python app\warehouse\build.py

What you should observe

Each build records a build id, per-layer row counts, total revenue, and a deterministic content fingerprint in meta.build_info. Two consecutive builds of the same inputs get distinct build ids but the identical fingerprint — the warehouse is a pure function of its inputs. The fingerprint value itself changes only when the schema or data changes (by design: it hashes every relation), so it is the property that is the claim, not any one hex value.

Observed

On the current build the fingerprint is 37d27fd64a4fa1e2ed420fe9e77620d1, identical across consecutive builds and re-verified by every gate run (the gate rebuilds the warehouse mid-suite and asserts the fingerprint is unchanged while the build id flips). It changed from the earlier 3e02a4b72bbd66a222d0ff9dc808e7fb on 2026-06-13 when the canonical connector seam added three silver tables — the audited business numbers were unchanged; the fingerprint moved because new relations were hashed. That traceable change is the mechanism working, not a regression.

Claim 6 — The DuckDB → PostgreSQL port is number-for-number equivalent

Reproduce

python app\warehouse\build.py
python app\scripts\run_quality_checks.py

Full methodology and the audited DuckDB baseline figures: app\docs\evidence.md and app\CHANGELOG.md (Week 1, engine port entry). The frozen DuckDB baseline file is kept on disk untouched.

What you should observe

Every number exact across engines: total revenue 29,356,250 over 27,659 orders; bronze/silver/gold row counts (60,398 fact rows, 18,484 customers, 295 products); repeat split 62.86% / 37.14% with 77.02% of revenue from repeat customers; LTV avg 1,588.10 / median 272.00 / p90 4,825.70 / max 13,294; spot cells like 2013-07 revenue = 1,371,595 and cohort 2013-01 month-1 retention = 5.23%.

Observed

2026-06-12: every compared figure identical between the audited DuckDB baseline and the PostgreSQL warehouse; quality suite 33/33 on the new engine.

Claim 7 — Every headline metric was independently validated: 10/10 confirmed

Reproduce: The validation queries are pasted verbatim in app\docs\evidence.md — written against the silver tables with natural keys, deliberately not reusing the gold views' SQL, so agreement is a cross-check rather than a tautology. Re-run any of them against the built warehouse.
What you should observe: Full-table diffs return 0 disagreeing rows (all 38 monthly revenue rows; all 430 cohort cells), and raw-row traces reconcile to the source — e.g. the top customer's 13 raw lines sum to their 13,294 LTV, line by line; the top product's 620 raw rows sum to 1,373,454 in bronze, pre-cleansing.
Observed: 2026-06-12: 10/10 claims CONFIRMED by the adversarial validation pass (mandate: "assume the warehouse is wrong until proven otherwise"). The same pass filed a real defect it found along the way — see claim 4.

Claim 8 — When the data doesn't exist, the agent refuses in governed text

Reproduce

python app\scripts\ask.py "Show me the conversion funnel"

What you should observe

No receipted facts, no approximation, no made-up funnel. A REFUSAL section quoting the governed "Not yet supported" text from app\docs\metrics.md: order data alone cannot recover pre-purchase steps, and no proxy is acceptable. The refusal is written to the audit log.

Observed

2026-06-13: governed refusal, verbatim — "Not yet supported (session_funnel), per app/docs/metrics.md: visits → product views → add-to-cart → checkout → purchase conversion. Requires an event-stream connector (storefront analytics)." (Full output on the product page.)

Claim 9 — Every number in the admin UI traces to a governed metric

Reproduce

python app\scripts\smoke_admin.py

What you should observe

31 deterministic checks: all five admin pages render with the real governed values (total revenue 29,356,250; 2013-07 = 1,371,595; cohort 2013-01 retention 5.23%; the live warehouse fingerprint), the deterministic Ask path returns receipted facts with a receipt id, and the JSON API stays unbroken.

Observed

2026-06-13: SMOKE_ADMIN PASSED — all checks green.

What we don't claim

No customers, no testimonials, no revenue-lift numbers, no "stores created" counter — Tilltrend is pre-launch and has none of these. No funnel or promo analytics — the connectors that would make them honest don't exist yet (claim 8 shows the product saying so itself). When any of this changes, the claim lands here first, with its command.

Found a claim on this site without a receipt? That's a bug. Report it and we'll fix it or retract the claim.