Statistical process control over your Parquet, powered by DuckDB —
and the case for one little constant: 2.66
A metric moved. Someone got paged. Was it real?
| Mistake | What it costs |
|---|---|
| Chasing routine noise | wasted investigations — and tampering: reacting to a stable process provably increases its variation |
| Dismissing a real shift | the regression ships, the pump fails, the fraud continues |
Both failure modes come from answering the wrong question: "did the number change?" It always changed.
This process is perfectly stable. Nothing happens — all day, every day. Every point is different. No point has an explanation.
The only question worth asking: did the process that generates the number change?
The noise inherent to the process. Routine. Unexplainable point-by-point — and predictable in range.
Response: leave it alone (or improve the system).
Variation with a findable, assignable cause that is not part of the process.
Response: go find it. This page is worth answering.
A process behaviour chart exists to tell these apart — so people stop doing it by vibes.
X̄ = mean(baseline) mR̄ = mean(|xᵢ − xᵢ₋₁|) UNPL = X̄ + 2.66·mR̄ LNPL = X̄ − 2.66·mR̄ ← limits frozen, then extended forever
2.66 = 3 / 1.128
People will pressure you to use 2 ("more sensitive") or 3.5 ("fewer pages"). Refuse. Tuning the constant is how a chart degenerates back into an arbitrary threshold.
Nothing so far assumed it was. Every distribution below is standardized to the same mean and variance — so the ±3σ lines never move. Watch the shape go pathological while the red tail past 3σ stays tiny.
| What you're willing to assume | P(stable point beyond 3σ) |
|---|---|
| Nothing at all (finite variance) — Chebyshev | ≤ 1/9 ≈ 11.1% |
| Unimodal, that's it — Vysochanskij–Petunin | ≤ 4/81 ≈ 4.9% |
| Normal (the familiar case) | 0.27% |
Simulated stable processes (2,000 trials each): 28-point baseline, mR̄-estimated sigma, frozen limits — then count false alarms on 500 in-control points. Estimation error included. Nothing hidden.
Every monster lands under the unimodal bound and at less than half of Chebyshev's ceiling. You don't need to know your distribution.
Same data, same 3-sigma shift. The global SD is inflated by the very signal you're hunting — its limits swell until the chart goes blind. Never compute limits as mean ± 3·std(data).
Same shift hits both charts. The frozen limits keep firing; the rolling window quietly swallows the shift into its own baseline and goes blind.
check never recomputes.| Rule 1 | a point outside the natural process limits |
| Rule 2 | nine consecutive points on one side of the center line (catches sustained smaller shifts) |
The Western Electric handbook lists more. Every rule you add buys sensitivity with false alarms — and each false alarm consumes an investigation and erodes trust in the chart.
Minimal rules is the same philosophy as the constant: resist the urge to tune.
$ duck-spc baseline \ --source 's3://bucket/events/' \ --value latency_ms \ --group-by region,service \ --derive day:p95 \ --window 2026-01-01:2026-01-29 \ > limits.json $ duck-spc check --limits limits.json # exit 0 → stable. go back to sleep. # exit 1 → the process changed.
read_parquet() — thousands of streams, one scan,
nothing materialized but answersday:p95, day:rate,
diff) for seasonal/trending/noisy raw dataGo back to sleep.
duck-spc · Postgres-and-a-bucket lineage · roadmap: DuckLake sources,
nonparametric limits, live ingestion
notebook: notebooks/trust_the_limits.py — every number in this
deck is computed there