blog

research

How to Detect iROI Without Waiting Weeks for a Geo Test

Q: How accurate is detecting natural experiments in historical spend data?

Accuracy depends on how much clean, unconfounded spend variation exists in your history. The tool reports 90% confidence intervals on all estimates — the interval width is the accuracy signal. A tight interval means the data contains enough clean variation to support a confident estimate; a wide interval means the result is directional only and should not be used as a direct MMM calibration input without further validation. The tool surfaces uncertainty explicitly rather than hiding it.

Q: What data do I need to run incrementality detection on past spend?

You need a CSV with at least 12 weeks of weekly, geo-structured data — channel spend and revenue split by geographic market, not national totals. Nationally aggregated data does not contain enough variation to detect clean natural experiments. A 24-week or 52-week history with meaningful spend variation per channel produces tighter, more actionable estimates.

Q: Can I calibrate my MMM without running a controlled experiment?

Yes. Natural experiments in historical data — periods where spend on a channel shifted significantly for reasons unrelated to revenue trends — function as observational quasi-experiments. Identifying and isolating those periods allows you to estimate causal lift without a controlled holdout. The resulting iROI can then be used as a calibration prior in your Bayesian MMM.

Q: How is iROI different from ROAS?

ROAS (Return on Ad Spend) counts all revenue associated with a channel, including revenue that would have occurred anyway without that ad spend. iROI (Incremental ROI) counts only the revenue caused by the ad spend — the lift above baseline. The gap between the two is the counterfactual: what would have happened if you had spent zero on that channel. ROAS systematically overstates contribution; iROI isolates it.

Q: How long does a geo-experiment actually take?

A properly powered geo-experiment — with geo selection, power analysis, holdout execution, and analysis — typically takes 8 to 16 weeks from design to results. Add 2 to 4 weeks for internal alignment and vendor coordination. For most teams, that means 3 to 5 months before a single channel has validated incremental lift data. Seasonal constraints often push the usable window further.

Q: What is the minimum dataset size to detect meaningful experiments?

The minimum useful dataset is 12 weeks of weekly, geo-structured data — channel spend and revenue split by geographic market — with clear, sustained spend variation in at least some channels. Small or gradual changes produce wide confidence intervals or no actionable estimates. In practice, 24 to 52 weeks of history with multiple variation events per channel is where the tool produces calibration-grade estimates.

Q: Does this replace geo-experiments entirely, or complement them?

It complements them. Natural experiment detection from historical data is fast and zero-cost to run, making it the right first step for channels where you have no calibration data. Controlled geo-experiments remain the gold standard for channels where the historical signal is weak, where you are testing a new channel with no spend history, or where the business decision is large enough to justify the time cost of a controlled design.

Detect true incremental ROI per channel from historical spend data — in less than a coffee break. The methodology, worked outputs, and when to use each approach.

Get a weekly dose of insightful people strategy content

Gabriele Franco, Co-Founder & CEO, Cassandra

At a glance

You can detect true incremental ROI per channel from your existing historical spend data without running a geo-experiment. The method works by identifying periods in your past data where channel spend shifted significantly — creating natural variation events that function as observational quasi-experiments. Run against those periods, a detection algorithm returns iROI estimates with confidence intervals, channel by channel, in minutes rather than months. This is not a shortcut that trades accuracy for speed. It is a different experimental design — one that exploits variation that already occurred in your data rather than manufacturing new variation through a controlled holdout. The result is a calibration signal you can use immediately to anchor your Marketing Mix Model to causal reality.

Why ROAS does not measure what you think it measures

Platform-reported ROAS counts every conversion associated with an ad impression or click as a caused conversion. It does not ask: would this revenue have occurred without the ad? The revenue from a branded search click that a customer would have completed anyway is counted the same as revenue from an upper-funnel video that genuinely shifted purchase intent.

The gap between reported ROAS and true incremental contribution is not a rounding error. In Marketing Mix Models built on the Cassandra platform, the median spread between attribution-reported channel contribution and MMM-derived incremental contribution is substantial — and the direction of the bias is always the same: attribution overstates, incrementality corrects downward.

iROI (incremental ROI) is the metric that answers the counterfactual: what revenue would you have lost if you had spent zero on this channel? ROAS cannot answer that question. It was not designed to. The difference matters because budget allocation decisions are counterfactual by nature — every dollar moved from one channel to another is a bet on what would have happened.

If your MMM is calibrated with attribution data rather than incremental evidence, the model learns to replicate the attribution bias, not the true causal structure. The budget recommendations it produces will be wrong in a predictable direction: over-allocating to channels with high attribution credit, under-allocating to channels whose contribution is largely uncredited.

For context on how risk and variance compound this problem across channels, see the risk-adjusted returns framework which examines why even a correctly measured ROI can lead to poor budget decisions if it ignores volatility.

What a geo-experiment actually measures — and why it takes months

A geo-experiment is a controlled holdout: you select matched geographic markets, hold out ad spend in a treatment group, maintain spend in a control group, and measure the revenue difference. Done correctly, it produces a clean causal estimate of lift for one channel in one geographic context over the test period.

The problem is operational, not methodological. A properly powered geo-experiment requires:

Design phase (2–4 weeks): Geo selection based on market match quality, power analysis to determine how many markets and what test duration you need to detect a meaningful effect, internal alignment on the question being tested and the channel being held out.

Execution phase (4–8 weeks): The holdout must run long enough to accumulate statistical power. Most well-designed tests require 4–8 weeks minimum. Seasonal contamination, promotional events, or platform algorithm changes during the window can invalidate results.

Analysis phase (2–4 weeks): Causal inference on geo-level data, adjusting for pre-period trends and market-specific confounders, followed by extrapolation to your full media mix.

Total timeline: 8 to 16 weeks from design to results in a best-case scenario. In practice, most teams report 3 to 5 months from decision to actionable output, once internal coordination and seasonal constraints are factored in.

That timeline means a channel you need to evaluate today will not have validated incremental lift data until Q3, Q4, or beyond. For a team trying to calibrate an MMM ahead of a budget planning cycle, the geo-experiment is usually too slow to be the first answer.

Natural experiments already exist in your historical spend data

Your spend data already contains variation — it just was not controlled. Every time your channel budget shifted significantly, a natural variation event occurred. If you cut Meta spend by 40% in March for internal reasons, then ramped it back in May, that ramp-down and ramp-up is a quasi-experiment. Revenue moved in the same direction, or it did not. The pattern of that movement — controlling for other channels, seasonality, and baseline trends — contains causal signal about Meta's incremental contribution.

This is not a new idea in econometrics. Natural experiments and difference-in-differences estimation have been foundational to causal inference research for decades. What has changed is the tooling: identifying valid natural experiment periods in noisy marketing data, and running the appropriate causal model against them, used to require a data science team and days of work. It can now be automated.

The conditions for a valid natural experiment in your spend data are:

A channel's spend shifted meaningfully and sustainably relative to its own baseline
The change was not caused by a simultaneous revenue shock (which would create reverse causality)
The data contains enough structural variation to support a clean causal comparison

When those conditions are met, the variation event can be used to estimate the channel's causal contribution to revenue in that period — a causal estimate, not an attributed one.

The detection methodology: how Cassandra identifies quasi-experimental periods

The Instant Incrementality tool runs the full detection automatically. You upload a CSV and results appear in minutes. You don't configure any of this — but for transparency, here is what happens:

Detect. The algorithm scans your channel-level spend history for periods where spend shifted meaningfully and sustainably relative to its own baseline. Both increases and decreases qualify. Each valid period becomes a candidate for analysis.

Isolate and model. Each candidate is checked for the structural conditions needed for clean causal estimation. Events that do not meet those conditions are excluded. For the events that pass, a causal inference model estimates the revenue that would have occurred without the spend change, and measures the actual response against it. The output is a per-event causal estimate of incremental return per dollar.

Aggregate and export. Estimates across all valid events for a channel are combined into a single iROI estimate with a 90% confidence interval. Wider intervals indicate fewer or noisier events; tighter intervals indicate more and cleaner evidence. The resulting iROI estimates can be used directly as calibration priors in a Bayesian MMM, anchoring the model to causal evidence rather than spend-revenue correlation.

No SQL, no data science environment, no vendor engagement required.

Your past data already contains an experiment.

Every time your channel spend shifted, a natural variation event occurred. Instant Incrementality detects those events and estimates your true incremental return — free, in minutes. You need geo-structured spend data (channel spend and revenue by geographic market, not national totals) and at least 12 weeks of history.

Try it on your data

Reading iROI outputs by channel: worked examples from the Instant Incrementality tool

The following outputs are drawn from the Instant Incrementality tool's demo dataset — a representative spend and revenue history across three channels. They illustrate how to read and act on iROI estimates.

Search — iROI: $4.09 per $1 (90% CI: $3.50–$4.38)

Every dollar spent on Search returns $4.09 in incremental revenue. The confidence interval is tight ($0.88 width), which means the estimate is stable across multiple quasi-experimental events in the history. The recommendation signal: scale it. At $4.09 iROI with high confidence, Search is generating incremental revenue well above breakeven and the estimate is reliable enough to justify increased allocation.

Performance Max — iROI: $2.49 per $1 (90% CI: $1.22–$3.58)

PMax returns $2.49 incrementally per dollar. That is above breakeven, making the channel net-positive, but the multiple is lower than Search. The interpretation depends on your budget constraint: if Search is not yet saturated, it should receive incremental budget before PMax. If Search is saturated, PMax is the next viable allocation target. Recommendation signal: scale it, but after confirming Search is at or near its saturation point.

Video — iROI: $1.00–$2.00 per $1 (wide interval)

The wide interval on Video signals low detection confidence — fewer or smaller spend variation events in the history, or higher co-movement with other channels during those events. The iROI range straddles near-breakeven territory. Recommendation signal: uncertain — do not cut the channel based on this alone, but do not scale spend without either a controlled test or a longer historical record that produces tighter estimates.

How to act on these outputs

One important note on interpretation: these are marginal estimates — they measure the incremental return at the observed level of spend change, not your channel's average historical ROI. Under diminishing returns, marginal return is lower than average return. Scaling spend means moving along the saturation curve — expect marginal return to decrease as you increase allocation beyond the observed level.

The decision framework keys off your channel's breakeven ROAS, not a fixed dollar threshold:

Condition	Action
CI lower bound > your channel's breakeven ROAS	Scale — incremental return clears the breakeven threshold even in the conservative scenario
CI straddles your breakeven ROAS	Hold — the evidence does not support scaling or cutting
CI upper bound < your channel's breakeven ROAS	Cut or pause — the channel does not cover its cost even in the optimistic scenario
Window not statistically significant or data insufficient	Run a controlled test — the historical signal is not clean enough to act on

A business with 30% contribution margin needs to recover more than $3.30 for every $1 spent just to break even on that channel. In ROAS terms: breakeven = 1 ÷ your contribution margin.

For a portfolio-level view of how iROI calibration interacts with channel correlation and risk, see the Marketing Efficient Frontier framework.

Image suggestion 1: Screenshot of Instant Incrementality tool output panel showing the three-channel iROI estimates with confidence intervals and recommendation labels. Alt text: "Instant Incrementality tool output showing iROI estimates by channel with 90% confidence intervals — Search 4.09, PMax 2.49, Video 1–2."

When to use historical detection vs. a full geo-experiment

Neither method is universally superior. They answer slightly different questions and suit different operational constraints.

Use historical detection when:

You need calibration data within days, not months — budget cycle, board presentation, or MMM build is imminent
The channel has been active for 12+ weeks with clear spend variation in its history
You want a fast first-pass estimate before deciding whether a controlled test is worth the time cost
You have multiple channels to evaluate simultaneously — geo-experiments typically isolate one channel at a time

Use a controlled geo-experiment when:

The channel is new — no spend history exists to mine for natural experiments
The historical estimate has a very wide confidence interval, indicating the signal is too weak to act on
The budget decision is large enough that a few months of delay is worth the precision gain
The channel's contribution is contested internally and you need an unambiguous controlled result to resolve the debate

Use both in sequence:

The most rigorous approach is to run historical detection first, generate iROI estimates, and use those estimates to prioritize which channels to validate with a controlled experiment. Channels with wide confidence intervals from historical detection are exactly the ones where a geo-experiment will produce the most incremental learning. Channels with tight high-confidence intervals from historical detection may not need a controlled test — the natural experiment data is sufficient to calibrate the MMM.

This sequenced approach gets you calibration data immediately, while directing the expensive and slow geo-experiment budget toward the questions where it adds the most value.

A note on the quasi-experimental assumption. Confidence intervals from historical detection are conditional on one assumption: that the spend changes being detected were not themselves caused by anticipated demand shifts. If a channel's budget increased because the team expected that period to be strong regardless of spend, the detection algorithm cannot distinguish that from a genuine causal lift. Where the business decision is high-stakes enough to make this uncertainty consequential, a controlled experiment remains the more defensible choice.

Image suggestion 2: Decision tree diagram — "Do I have 12+ weeks of geo-structured spend history?" → Yes → "Did spend shift significantly for any channel?" → Yes → "Run historical detection." No → "Run geo-experiment." Alt text: "Decision tree for choosing between historical incrementality detection and geo-experiment based on data availability and spend variation."

Image suggestion 3: Timeline comparison showing geo-experiment (8–16 weeks) vs. historical detection (minutes to days) side by side, with tradeoff annotation on CI width. Alt text: "Timeline comparison: geo-experiment takes 8–16 weeks; historical natural experiment detection returns results in minutes, with wider confidence intervals where data is sparse."

Frequently Asked Questions

How accurate is detecting natural experiments in historical spend data?

Accuracy depends on how much clean, unconfounded spend variation exists in your history. The tool reports 90% confidence intervals on all estimates — the interval width is the accuracy signal. A tight interval means the data contains enough clean variation to support a confident estimate; a wide interval means the result is directional only and should not be used as a direct MMM calibration input without further validation. The tool does not hide uncertainty — it surfaces it explicitly so you can decide whether to act or to run a controlled test.

What data do I need to run incrementality detection on past spend?

Your data needs geographic structure — channel spend and revenue split by geographic market, not national totals. Nationally aggregated data does not contain enough variation to detect clean natural experiments. Beyond structure, you need at least 12 weeks of history with meaningful spend variation in the channels you want to evaluate. A 24-week or 52-week history with multiple spend variation events per channel produces tighter estimates and better calibration-grade output.

Can I calibrate my MMM without running a controlled experiment?

Yes. Natural experiments in historical data — periods where spend on a channel shifted significantly for reasons unrelated to revenue trends — function as observational quasi-experiments. Identifying and isolating those periods allows you to estimate causal lift without a controlled holdout. The resulting iROI can then be used as a calibration prior in your Bayesian MMM, anchoring the model's channel contribution estimates to causal evidence rather than letting it fit freely to spend-revenue correlation. The calibrated model produces more reliable budget recommendations and is less likely to amplify attribution bias that exists in platform-reported data.

How is iROI different from ROAS?

ROAS (Return on Ad Spend) counts all revenue associated with a channel — including revenue that would have occurred anyway without the ad spend. iROI (Incremental ROI) counts only the revenue caused by the ad spend — the lift above the counterfactual baseline. The gap is the organic baseline: revenue from customers who would have converted through branded search, direct, or word-of-mouth regardless of your paid activity. ROAS systematically overstates contribution because it assigns organic revenue to paid channels whenever a touchpoint exists. iROI isolates the causal portion. For channels with high organic baseline traffic — branded search, retargeting — the ROAS-to-iROI gap is largest. For upper-funnel channels reaching genuinely new audiences, the two can converge, but only when attribution windows are long enough to capture delayed conversion.

How long does a geo-experiment actually take?

A properly powered geo-experiment — from geo selection through power analysis, holdout execution, and analysis — typically takes 8 to 16 weeks from design to results. Add 2 to 4 weeks for internal alignment and vendor or platform coordination. For most teams, that means 3 to 5 months before a single channel has validated incremental lift data. Seasonal constraints reduce the usable test windows further: running a holdout during Q4 distorts results; starting in January often means results arrive in late Q1 or Q2. The calendar math is one of the primary reasons teams look for calibration alternatives that can move faster.

What is the minimum dataset size to detect meaningful experiments?

The minimum useful dataset is 12 weeks of weekly, geo-structured data with clear, sustained spend variation in at least some channels. Your data needs to show periods where channel spend shifted clearly and sustainably relative to its own baseline — small or gradual changes produce wide confidence intervals or no actionable estimates. In practice, 24 to 52 weeks of history with multiple variation events per channel is where the tool produces calibration-grade estimates. Teams with very stable spend histories — budgets that rarely shift significantly — will see fewer detectable events and should treat results as directional signals rather than direct MMM calibration inputs.

Does this replace geo-experiments entirely, or complement them?

It complements them. Historical detection is fast and zero-cost to run, making it the right first step when calibration data is needed quickly or across multiple channels simultaneously. Controlled geo-experiments remain the gold standard for channels with weak historical signal, for genuinely new channels with no spend history to mine, and for decisions large enough to justify the time cost of a controlled design. The practical sequencing: run historical detection first to get immediate calibration across your full channel set, identify which channels returned wide confidence intervals, and direct geo-experiment capacity toward exactly those channels where the historical data could not resolve the question.

Your past data is already an experiment.

Most teams wait months for geo-experiments before they trust their budget decisions. Your historical spend already contains the signal you need to calibrate now, act faster, and build toward predictable growth. Instant Incrementality extracts that signal — no holdout setup, no waiting period. You need geo-structured spend data and at least 12 weeks of history.

Try it on your data

What Instant Incrementality does not measure. The tool produces marginal iROI estimates at the observed spend level — not a saturation curve, not average channel ROI, not a response function. It covers channels with meaningful spend variation across geographies in your history; channels that ran with flat budgets or uniformly across all markets will not yield estimates. It does not account for adstock or carryover on long-lag channels, and it does not produce a budget allocation plan. Estimates are quasi-experimental, not randomized — they are conditional on the counterfactual assumption described above.

How to Detect iROI Without Waiting Weeks for a Geo Test

Detect true incremental ROI per channel from historical spend data — in less than a coffee break. The methodology, worked outputs, and when to use each approach.

Get a weekly dose of insightful people strategy content

At a glance

Why ROAS does not measure what you think it measures

What a geo-experiment actually measures — and why it takes months

Natural experiments already exist in your historical spend data

The detection methodology: how Cassandra identifies quasi-experimental periods

Reading iROI outputs by channel: worked examples from the Instant Incrementality tool

When to use historical detection vs. a full geo-experiment

Frequently Asked Questions

How accurate is detecting natural experiments in historical spend data?

What data do I need to run incrementality detection on past spend?

Can I calibrate my MMM without running a controlled experiment?

How is iROI different from ROAS?

How long does a geo-experiment actually take?

What is the minimum dataset size to detect meaningful experiments?

Does this replace geo-experiments entirely, or complement them?

See how to grow, don't guess it

Unlock measurements and decisions that will unlock growth for your brand

See how to grow, don't guess it

Unlock measurements and decisions that will unlock growth for your brand