Measure true causal incrementality in 60 secs — try it free

blog

/

research

How to Calibrate Your MMM Without Waiting for Lift Tests

Detect true incremental ROI per channel from historical spend data — in less than a coffee break. The methodology, worked outputs, and when to use each approach.

Get a weekly dose of insightful people strategy content

How to detect iROI without waiting weeks for geo-experiment

Gabriele Franco, Co-Founder & CEO, Cassandra

You can detect true incremental ROI per channel from your existing historical spend data without running a geo-experiment. The method works by identifying periods in your past data where channel spend shifted significantly — creating natural variation events that function as observational quasi-experiments. Run against those periods, a detection algorithm returns iROI estimates with confidence intervals, channel by channel, in minutes rather than months. This is not a shortcut that trades accuracy for speed. It is a different experimental design — one that exploits variation that already occurred in your data rather than manufacturing new variation through a controlled holdout. The result is a calibration signal you can use immediately to anchor your Marketing Mix Model to causal reality.

Why ROAS does not measure what you think it measures {#why-roas-misleads}

Platform-reported ROAS counts every conversion associated with an ad impression or click as a caused conversion. It does not ask: would this revenue have occurred without the ad? The revenue from a branded search click that a customer would have completed anyway is counted the same as revenue from an upper-funnel video that genuinely shifted purchase intent.

The gap between reported ROAS and true incremental contribution is not a rounding error. In Marketing Mix Models built on the Cassandra platform, the median spread between attribution-reported channel contribution and MMM-derived incremental contribution is substantial — and the direction of the bias is always the same: attribution overstates, incrementality corrects downward.

iROI (incremental ROI) is the metric that answers the counterfactual: what revenue would you have lost if you had spent zero on this channel? ROAS cannot answer that question. It was not designed to. The difference matters because budget allocation decisions are counterfactual by nature — every dollar moved from one channel to another is a bet on what would have happened.

If your MMM is calibrated with attribution data rather than incremental evidence, the model learns to replicate the attribution bias, not the true causal structure. The budget recommendations it produces will be wrong in a predictable direction: over-allocating to channels with high attribution credit, under-allocating to channels whose contribution is largely uncredited.

For context on how risk and variance compound this problem across channels, see the risk-adjusted returns framework which examines why even a correctly measured ROI can lead to poor budget decisions if it ignores volatility.

What a geo-experiment actually measures — and why it takes months {#geo-experiment-timeline}

A geo-experiment is a controlled holdout: you select matched geographic markets, hold out ad spend in a treatment group, maintain spend in a control group, and measure the revenue difference. Done correctly, it produces a clean causal estimate of lift for one channel in one geographic context over the test period.

The problem is operational, not methodological. A properly powered geo-experiment requires:

Design phase (2–4 weeks): Geo selection based on market match quality, power analysis to determine how many markets and what test duration you need to detect a meaningful effect, internal alignment on the question being tested and the channel being held out.

Execution phase (4–8 weeks): The holdout must run long enough to accumulate statistical power. Most well-designed tests require 4–8 weeks minimum. Seasonal contamination, promotional events, or platform algorithm changes during the window can invalidate results.

Analysis phase (2–4 weeks): Causal inference on geo-level data, adjusting for pre-period trends and market-specific confounders, followed by extrapolation to your full media mix.

Total timeline: 8 to 16 weeks from design to results in a best-case scenario. In practice, most teams report 3 to 5 months from decision to actionable output, once internal coordination and seasonal constraints are factored in.

That timeline means a channel you need to evaluate today will not have validated incremental lift data until Q3, Q4, or beyond. For a team trying to calibrate an MMM ahead of a budget planning cycle, the geo-experiment is usually too slow to be the first answer.

Natural experiments already exist in your historical spend data {#natural-experiments-methodology}

Your spend data already contains variation — it just was not controlled. Every time your channel budget shifted significantly, a natural variation event occurred. If you cut Meta spend by 40% in March for internal reasons, then ramped it back in May, that ramp-down and ramp-up is a quasi-experiment. Revenue moved in the same direction, or it did not. The pattern of that movement — controlling for other channels, seasonality, and baseline trends — contains causal signal about Meta's incremental contribution.

This is not a new idea in econometrics. Natural experiments and difference-in-differences estimation have been foundational to causal inference research for decades. What has changed is the tooling: identifying valid natural experiment periods in noisy marketing data, and running the appropriate causal model against them, used to require a data science team and days of work. It can now be automated.

The conditions for a valid natural experiment in your spend data are:

  • A channel's spend shifted meaningfully and sustainably relative to its own baseline

  • The change was not caused by a simultaneous revenue shock (which would create reverse causality)

  • The geographic structure of the data allows a clean comparison — periods where spend moved uniformly across all markets are excluded

When those conditions are met, the variation event can be used to estimate the channel's causal contribution to revenue in that period — a causal estimate, not an attributed one.

The detection methodology: how Cassandra identifies quasi-experimental periods {#detection-methodology}

The Instant Incrementality tool automates the natural experiment detection process. Here is what runs under the hood:

Step 1 — Variation event detection. The algorithm scans your channel-level spend history for periods where spend shifted meaningfully and sustainably above or below its own baseline. Both increases and decreases are detected. Each qualifying period is flagged as a candidate quasi-experiment.

Step 2 — Confound filtering. For each candidate event, the algorithm checks whether the geographic structure needed for clean causal comparison is available. If a channel's spend moved uniformly across all markets simultaneously during the candidate window, no clean comparison group exists and the event is excluded.

Step 3 — Revenue response modeling. For each clean event, a causal inference model constructs a counterfactual revenue baseline for the spend-change window and measures the actual revenue response against it. Trend and seasonality are absorbed through the structure of the comparison itself. The output is a per-event causal estimate of lift per dollar.

Step 4 — Aggregation and interval computation. Estimates across all valid events for a channel are aggregated into a single iROI estimate with a 90% confidence interval. Wider intervals indicate fewer or noisier events; tighter intervals indicate more and cleaner quasi-experimental periods.

Step 5 — Calibration export. The resulting iROI estimates can be used directly as calibration priors in a Bayesian MMM. Instead of letting the model freely estimate channel contribution from spend-revenue correlation, you anchor it to the causal estimates produced by natural experiment detection.

The entire process runs in minutes on a standard CSV upload. No SQL, no data science environment, no vendor engagement required.

Your past data already contains an experiment.

Every time your channel spend shifted, a natural variation event occurred. Instant Incrementality detects those events and estimates your true incremental return — free, in minutes.

Try it on your data

Reading iROI outputs by channel: worked examples from the Instant Incrementality tool {#iroi-outputs-worked-examples}

The following outputs are drawn from the Instant Incrementality tool's demo dataset — a representative spend and revenue history across three channels. They illustrate how to read and act on iROI estimates.

Search — iROI: $4.09 per $1 (90% CI: $3.50–$4.38)

Every dollar spent on Search returns $4.09 in incremental revenue. The confidence interval is tight ($0.88 width), which means the estimate is stable across multiple quasi-experimental events in the history. The recommendation signal: scale it. At $4.09 iROI with high confidence, Search is generating incremental revenue well above breakeven and the estimate is reliable enough to justify increased allocation.

Performance Max — iROI: $2.49 per $1 (90% CI: not shown in demo)

PMax returns $2.49 incrementally per dollar. That is above breakeven, making the channel net-positive, but the multiple is lower than Search. The interpretation depends on your budget constraint: if Search is not yet saturated, it should receive incremental budget before PMax. If Search is saturated, PMax is the next viable allocation target. Recommendation signal: scale it, but after confirming Search is at or near its saturation point.

Video — iROI: $1.00–$2.00 per $1 (wide interval)

The wide interval on Video signals low detection confidence — fewer or smaller spend variation events in the history, or higher co-movement with other channels during those events. The iROI range straddles near-breakeven territory. Recommendation signal: uncertain — do not cut the channel based on this alone, but do not scale spend without either a controlled test or a longer historical record that produces tighter estimates.

How to act on these outputs

One important note on interpretation: these are marginal estimates — they measure the incremental return at the observed level of spend change, not your channel's average historical ROI. Under diminishing returns, marginal return is lower than average return. Scaling spend means moving along the saturation curve — expect marginal return to decrease as you increase allocation beyond the observed level.

The decision framework keys off your channel's breakeven ROAS, not a fixed dollar threshold:

Condition

Action

CI lower bound > your channel's breakeven ROAS

Scale — incremental return clears the breakeven threshold even in the conservative scenario

CI straddles your breakeven ROAS

Hold — the evidence does not support scaling or cutting

CI upper bound < your channel's breakeven ROAS

Cut or pause — the channel does not cover its cost even in the optimistic scenario

Window not statistically significant or data insufficient

Run a controlled test — the historical signal is not clean enough to act on

Breakeven ROAS = 1 ÷ your contribution margin. A business with 30% contribution margin breaks even at ROAS ≈ 3.3, not 1.0.

For a portfolio-level view of how iROI calibration interacts with channel correlation and risk, see the Marketing Efficient Frontier framework.

Image suggestion 1: Screenshot of Instant Incrementality tool output panel showing the three-channel iROI estimates with confidence intervals and recommendation labels. Alt text: "Instant Incrementality tool output showing iROI estimates by channel with 90% confidence intervals — Search 4.09, PMax 2.49, Video 1–2."

When to use historical detection vs. a full geo-experiment {#historical-detection-vs-geo-experiment}

Neither method is universally superior. They answer slightly different questions and suit different operational constraints.

Use historical detection when:

  • You need calibration data within days, not months — budget cycle, board presentation, or MMM build is imminent

  • The channel has been active for 12+ weeks with clear spend variation in its history

  • You want a fast first-pass estimate before deciding whether a controlled test is worth the time cost

  • You have multiple channels to evaluate simultaneously — geo-experiments typically isolate one channel at a time

Use a controlled geo-experiment when:

  • The channel is new — no spend history exists to mine for natural experiments

  • The historical estimate has a very wide confidence interval, indicating the signal is too weak to act on

  • The budget decision is large enough that a few months of delay is worth the precision gain

  • The channel's contribution is contested internally and you need an unambiguous controlled result to resolve the debate

Use both in sequence:

The most rigorous approach is to run historical detection first, generate iROI estimates, and use those estimates to prioritize which channels to validate with a controlled experiment. Channels with wide confidence intervals from historical detection are exactly the ones where a geo-experiment will produce the most incremental learning. Channels with tight high-confidence intervals from historical detection may not need a controlled test — the natural experiment data is sufficient to calibrate the MMM.

This sequenced approach gets you calibration data immediately, while directing the expensive and slow geo-experiment budget toward the questions where it adds the most value.

A note on the quasi-experimental assumption. Confidence intervals from historical detection are conditional on one assumption: that the spend changes being detected were not themselves caused by anticipated demand shifts. If a channel's budget increased because the team expected that period to be strong regardless of spend, the detection algorithm cannot distinguish that from a genuine causal lift. Where the business decision is high-stakes enough to make this uncertainty consequential, a controlled experiment remains the more defensible choice.

Image suggestion 2: Decision tree diagram — "Do I have 12+ weeks of spend history?" → Yes → "Did spend vary >20% for any channel?" → Yes → "Run historical detection." No → "Run geo-experiment." Alt text: "Decision tree for choosing between historical incrementality detection and geo-experiment based on data availability and variation."

Image suggestion 3: Timeline comparison showing geo-experiment (8–16 weeks) vs. historical detection (minutes to days) side by side, with tradeoff annotation on CI width. Alt text: "Timeline comparison: geo-experiment takes 8–16 weeks; historical natural experiment detection returns results in minutes, with wider confidence intervals where data is sparse."

Frequently Asked Questions {#faq}

How accurate is detecting natural experiments in historical spend data? {#faq-accuracy}

Accuracy depends on how much clean, unconfounded spend variation exists in your history. The tool reports 90% confidence intervals on all estimates — the interval width is the accuracy signal. A tight interval means the data contains enough clean variation to support a confident estimate; a wide interval means the result is directional only and should not be used as a direct MMM calibration input without further validation. The tool does not hide uncertainty — it surfaces it explicitly so you can decide whether to act or to run a controlled test.

What data do I need to run incrementality detection on past spend? {#faq-data-requirements}

Your data needs geographic structure — channel spend and revenue split by geographic market, not national totals. The tool uses variation across markets to construct the comparison needed for causal estimation; nationally aggregated data does not provide enough structure to detect clean natural experiments. Beyond structure, you need at least 12 weeks of history with meaningful spend variation in the channels you want to evaluate. A 24-week or 52-week history with multiple spend variation events per channel produces tighter estimates and better calibration-grade output.

Can I calibrate my MMM without running a controlled experiment? {#faq-mmm-calibration}

Yes. Natural experiments in historical data — periods where spend on a channel shifted significantly for reasons unrelated to revenue trends — function as observational quasi-experiments. Identifying and isolating those periods allows you to estimate causal lift without a controlled holdout. The resulting iROI can then be used as a calibration prior in your Bayesian MMM, anchoring the model's channel contribution estimates to causal evidence rather than letting it fit freely to spend-revenue correlation. The calibrated model produces more reliable budget recommendations and is less likely to amplify attribution bias that exists in platform-reported data.

How is iROI different from ROAS? {#faq-iroi-vs-roas}

ROAS (Return on Ad Spend) counts all revenue associated with a channel — including revenue that would have occurred anyway without the ad spend. iROI (Incremental ROI) counts only the revenue caused by the ad spend — the lift above the counterfactual baseline. The gap is the organic baseline: revenue from customers who would have converted through branded search, direct, or word-of-mouth regardless of your paid activity. ROAS systematically overstates contribution because it assigns organic revenue to paid channels whenever a touchpoint exists. iROI isolates the causal portion. For channels with high organic baseline traffic — branded search, retargeting — the ROAS-to-iROI gap is largest. For upper-funnel channels reaching genuinely new audiences, the two can converge, but only when attribution windows are long enough to capture delayed conversion.

How long does a geo-experiment actually take? {#faq-geo-experiment-duration}

A properly powered geo-experiment — from geo selection through power analysis, holdout execution, and analysis — typically takes 8 to 16 weeks from design to results. Add 2 to 4 weeks for internal alignment and vendor or platform coordination. For most teams, that means 3 to 5 months before a single channel has validated incremental lift data. Seasonal constraints reduce the usable test windows further: running a holdout during Q4 distorts results; starting in January often means results arrive in late Q1 or Q2. The calendar math is one of the primary reasons teams look for calibration alternatives that can move faster.

What is the minimum dataset size to detect meaningful experiments? {#faq-minimum-dataset}

The minimum useful dataset is 12 weeks of weekly, geo-structured data with clear, sustained spend variation in at least some channels. Your data needs to show periods where channel spend shifted clearly and sustainably relative to its own baseline — small or gradual changes produce wide confidence intervals or no actionable estimates. In practice, 24 to 52 weeks of history with multiple variation events per channel is where the tool produces calibration-grade estimates. Teams with very stable spend histories — budgets that rarely shift significantly — will see fewer detectable events and should treat results as directional signals rather than direct MMM calibration inputs.

Does this replace geo-experiments entirely, or complement them? {#faq-replace-or-complement}

It complements them. Historical detection is fast and zero-cost to run, making it the right first step when calibration data is needed quickly or across multiple channels simultaneously. Controlled geo-experiments remain the gold standard for channels with weak historical signal, for genuinely new channels with no spend history to mine, and for decisions large enough to justify the time cost of a controlled design. The practical sequencing: run historical detection first to get immediate calibration across your full channel set, identify which channels returned wide confidence intervals, and direct geo-experiment capacity toward exactly those channels where the historical data could not resolve the question.

No geo-experiment? You can still measure lift.

Instant Incrementality extracts causal signal from your existing spend history — no holdout setup, no waiting period. Upload a CSV and get iROI estimates by channel in minutes.

Try it on your data

What Instant Incrementality does not measure. The tool produces marginal iROI estimates at the observed spend level — not a saturation curve, not average channel ROI, not a response function. It covers channels with meaningful spend variation across geographies in your history; channels that ran with flat budgets or uniformly across all markets will not yield estimates. It does not account for adstock or carryover on long-lag channels, and it does not produce a budget allocation plan. Estimates are quasi-experimental, not randomized — they are conditional on the counterfactual assumption described above.

Gabriele Franco, Co-Founder & CEO, Cassandra

Cassandra is an AI-first Bayesian MMM and incrementality platform. The Instant Incrementality tool is free to use at cassandra.app/free-measurement-tools/instant-incrementality.

Copyright © 2025 – All Rights Reserved

Copyright © 2024-2025 – All Rights Reserved