29 min read

How to Measure Abandoned Cart Email Incrementality: Step-by-Step

Step-by-step guide to measure true incrementality of abandoned cart emails—randomized holdouts, contamination control, sample-size planning, and RPR-based analysis.

How to Measure Abandoned Cart Email Incrementality: Step-by-Step

True incrementality tells you how many extra orders and dollars your abandoned cart flow creates versus what would have happened anyway. That’s different from gross “recovered revenue,” which counts any order that happened after an email—even if the customer would’ve purchased without it. Your decision metric should be causal: revenue per recipient (RPR) or conversion lift.

Two ways teams commonly measure the same flow produce very different stories:

Measure

What it says

Gross recovered revenue

Revenue attributed to sends without a counterfactual; inflated by natural purchases

Incrementality (lift)

Additional orders/revenue caused by the flow vs. a randomized holdout

Key takeaways

  • Randomized, persistent user-level holdouts are the gold standard for measuring abandoned cart email incrementality.

  • Plan for contamination control: suppress holdouts from SMS and ads, and exit-on-purchase within minutes.

  • Power on one primary decision metric (RPR or conversion rate) and run 2–4 weeks, or until pre-registered sample thresholds are met.

  • Analyze conversion lift, incremental revenue, and RPR; report confidence intervals and guardrail metrics.

  • Make go/no-go calls with pre-defined rules; then iterate cadence, creative, and incentives.

Step 0 — Define eligibility and your decision metric

Start with a clean cohort and a clear primary metric.

  • Eligibility: people who reached checkout (contact captured) and did not place an order before your first send delay. On Shopify, use the AbandonedCheckout object and related queries to identify these events; see Shopify’s docs on the AbandonedCheckout object and queries in Admin GraphQL for canonical signals: Shopify AbandonedCheckout documentation.

  • First-send delay: 1–4 hours (many brands land at 2–4) to let natural purchases complete without messaging friction. For timing heuristics and RPR by touch cohorts, see the timing benchmarks and templates in the guide on abandoned cart timing cohorts and benchmarks.

  • Exit-on-purchase: remove customers from the flow as soon as an order is created—aim for a ≤5-minute SLA so you don’t send post-purchase.

  • Primary metric: choose RPR if your AOV and discounting vary by segment; choose conversion lift if your goal is more orders regardless of AOV. Report both, but pick one as the decision driver.

For send cadence structure (e.g., 1h/24h/72h frameworks) and how to test timing safely, see this practical overview of abandoned cart email timing frameworks.

Step 1 — Set up a persistent randomized holdout

A user-level holdout ensures apples-to-apples comparison across identical eligibility. In Klaviyo, you have two robust options.

Option A: Global Holdout Groups

  • Create a global holdout percentage and persist assignments across flows. This is the simplest path if you have permission to set account-wide experiments. Klaviyo documents this in their help center; see the guide to Klaviyo Global Holdout Groups.

Option B: Persisted profile property + flow split

  • Assign each eligible profile to “treatment” or “holdout” using a deterministic hash (so users keep the same assignment across touches and sessions) and split your abandoned cart flow by that property.

Deterministic hashing examples (use one consistent identifier, like email or Shopify customer_id):

# Python
  import hashlib
  
  def assign_group(identifier: str, holdout_percent: int = 15) -> str:
      h = hashlib.md5(identifier.strip().lower().encode()).hexdigest()
      bucket = int(h[:8], 16) % 100
      return "holdout" if bucket < holdout_percent else "treatment"
  
// Node/Browser (requires crypto in Node)
  import crypto from 'crypto';
  
  function assignGroup(identifier, holdoutPercent = 15) {
    const hash = crypto
      .createHash('md5')
      .update(String(identifier).trim().toLowerCase())
      .digest('hex');
    const bucket = parseInt(hash.slice(0, 8), 16) % 100;
    return bucket < holdoutPercent ? 'holdout' : 'treatment';
  }
  

Then persist the assignment to the ESP profile and branch your flow:

// Example: Klaviyo Profiles API (conceptual pattern)
  POST https://a.klaviyo.com/api/profiles/
  {
    "data": {
      "type": "profile",
      "attributes": {
        "email": "jane@example.com",
        "properties": {
          "ac_incrementality_group": "holdout"
        }
      }
    }
  }
  

In the abandoned cart flow, add a conditional split on Properties about someone where ac_incrementality_group equals "treatment" to send messages; route "holdout" down a no-send path that only evaluates exits and logging.

Control size: many teams start at 10–20% and adjust based on power and business tolerance.

Step 2 — Control contamination across channels

To measure true lift, the control group must not be influenced by parallel recovery efforts.

  • Suppress SMS and paid retargeting for holdouts during the experiment window. Digital Applied recommends unified randomized holdouts across recovery channels to isolate persuasion effects; see the 2026 playbook on abandoned cart recovery and holdouts.

  • Freeze creative, cadence, and incentives for the duration. Mid-test changes blur the treatment effect.

  • Respect quiet hours and align local send windows for both arms.

  • Deduplicate multi-abandon events so each person has one active assignment across overlapping sessions.

Step 3 — Plan sample size and runtime

Power your test on a single primary metric and pre-register your minimum detectable effect (MDE).

  • Conversion-rate power: use a two-proportion calculator with baseline checkout-to-order rate and desired MDE. A reputable starting point is the Optimizely sample size calculator.

  • RPR power: revenue is noisy; plan larger samples using mean-difference power formulas (or a simulation). CXL’s guidance on power and runtime is pragmatic; see the overview on statistical power for experiments.

  • Runtime: run for 2–4 weeks or until you meet pre-specified sample thresholds; avoid early peeking that inflates false positives. CXL discusses safe stopping and peeking discipline in the same resource above.

  • Heuristics: if your expected lift is small (<10% RPR), increase control share or extend duration. If volume is high and effects are large, a 10% holdout may be plenty.

Worked mini-example (sanity check)

  • Baseline placed-order rate among eligible: 3.0% (from past flows or benchmarks such as Klaviyo’s abandoned cart posts).

  • Target relative lift: +20% (to 3.6%).

  • With alpha 0.05 and power 80%, you’ll typically need on the order of tens of thousands of eligible profiles across both arms. If that’s unrealistic, either increase MDE, lengthen the test, or power on conversion instead of RPR.

For context on typical outcomes, see Klaviyo’s discussion of flow performance in their overview of abandoned cart benchmarks and placed order rates.

Step 4 — Launch and QA the experiment

Before you “go live,” smoke test the entire pipeline.

Pre-launch checks

  • Eligibility counts look plausible (vs. historical). New vs. returning and device/geo balance are similar across arms.

  • Holdout suppression is honored in email/SMS/ads.

  • Purchase exits remove people within ≤5 minutes of order creation.

Daily monitors

  • Eligible vs. sent vs. exited-on-purchase counts by arm.

  • Deliverability guardrails: unsubscribe, spam complaint, bounce rates.

  • Holdout leakage review: no accidental sends to control.

Step 5 — Analyze abandoned cart email incrementality and revenue

When the window closes, lock your dataset and compute lift. At a minimum you’ll join eligibility to orders and revenue over the analysis window.

SQL sketch (conceptual; adapt to your warehouse schema)

WITH eligible AS (
    SELECT person_id, first_eligibility_ts
    FROM checkout_reached  -- Shopify checkout/contact-captured event
    WHERE first_eligibility_ts BETWEEN '2026-05-01' AND '2026-05-31'
      AND purchased_before_first_send = FALSE
  ),
  assign AS (
    SELECT person_id, ac_incrementality_group AS group
    FROM esp_profiles
  ),
  orders AS (
    SELECT person_id, order_ts, order_value
    FROM shopify_orders
    WHERE order_ts BETWEEN '2026-05-01' AND '2026-06-15' -- include a buffer for lag
  )
  SELECT
    a.group,
    COUNT(DISTINCT e.person_id) AS recipients,
    COUNT(DISTINCT CASE WHEN o.order_ts IS NOT NULL THEN e.person_id END) AS purchasers,
    SUM(COALESCE(o.order_value,0)) AS revenue,
    1.0 * COUNT(DISTINCT CASE WHEN o.order_ts IS NOT NULL THEN e.person_id END) / COUNT(DISTINCT e.person_id) AS cr,
    1.0 * SUM(COALESCE(o.order_value,0)) / COUNT(DISTINCT e.person_id) AS rpr
  FROM eligible e
  LEFT JOIN assign a ON a.person_id = e.person_id
  LEFT JOIN (
    SELECT person_id, MIN(order_ts) AS order_ts, SUM(order_value) AS order_value
    FROM orders
    GROUP BY 1
  ) o ON o.person_id = e.person_id
  GROUP BY 1;
  

Core calculations

  • Conversion lift: (CR_treatment − CR_control) / CR_control. Also report the absolute difference in percentage points.

  • Incremental revenue: Revenue_treatment − Revenue_control for matched eligibility windows.

  • RPR: total revenue ÷ recipients, reported per touch and for the full flow.

  • Confidence intervals: compute for your primary metric (Wilson interval for conversion; t-interval or bootstrap for RPR). Report them alongside point estimates.

Neutral, real-world workflow note: many teams centralize Shopify orders and ESP events in an attribution workspace to compute per-touch and flow-level RPR with confidence bands. A tool like Attribuly can be used to unify Shopify server-side purchase events and Klaviyo flow exposure so these lift and RPR views stay decision-grade, while still honoring your randomized holdouts.

Step 6 — Decide, report, and iterate

Pre-register decision rules and stick to them.

  • Go if your confidence interval for the primary metric excludes zero (or meets your Bayesian threshold) and guardrails are acceptable. No-go if the interval straddles zero or guardrails are breached.

  • Reporting template: show a small table with recipients, CR, RPR, lift, incremental revenue, and guardrails by arm; include a simple sparkline of time-to-purchase.

  • Narrative example: “Across 42k eligible profiles over 3.5 weeks, treatment RPR was $3.28 vs. $2.92 in holdout (+12.3% lift, 95% CI +5.1% to +19.4%). Unsub stayed at 0.09% and complaints at 0.01%. We recommend adopting v1 cadence and testing offer framing next.”

  • Next iteration ideas: adjust send delays, simplify creative, or segment incentives. If underpowered, re-run with a larger holdout share or longer duration.

Troubleshooting quick fixes

  • Holdout leakage: quarantine leaked IDs from final analysis; harden flow filters; re-sync suppression audiences.

  • Underpowered test: extend runtime, widen MDE, or increase the control percentage; if needed, power on conversion first, then validate RPR.

  • Identity mismatch: join on multiple identifiers (email, customer_id); quantify match rates and annotate bias risk.

  • Order webhook lag: extend your analysis window; rely on server-side order creation as the authoritative exit signal.

References and further reading