CharlotteCharlotte
Fantasy Lab/The Blind Backtest
Methodology Receipt · 20172025

Could the model actually win an ODB Fantasy contest?

We rebuilt the WSOP fantasy projection for every year from 2017 to 2025 using only the data available before that year's Main Event kickoff. No ridge regression. No knowledge of who broke through. The model picked an 8-player lineup at the actual ODB auction prices, added the highest-projected bonus pick from that year's curated bonus pool, and was scored against the real ODB Fantasy field using that year's contemporaneous rules.

Here's exactly what happened.

3/7
Cash rate
3 of 7 years in the money
0
Wins
no crowns — best finish 16th of 478
+143%
7-year ROI
Net $5K on $4K entry
68%
Avg finish
Beat ~68% of the field on average · rank 164

The headline: The model doesn't win crowns. But across seven years it cashed three times for $9K on $4K in entries — a real edge, mostly produced by consistent top-quartile finishes rather than home runs. The biggest miss came in 2022, when our most expensive pick (Hellmuth) underperformed and we missed Koray Aldemir as a $2 sleeper.

The Method

What the model is allowed to see, and what it isn't.

Projector: Per-player projection = recency-weighted (5/4/3) average of fantasy points across Y-1, Y-2, Y-3, regressed toward 10 points using events/(events+30). No ridge regression, no future data, no information from year Y or beyond.

Pool + prices: Each year's actual ODB Fantasy player pool (primary + bonus, as published before that year's WSOP). Each year's actual ODB Fantasy auction (the prices contestants paid).

Bonus rule: ODB Fantasy has used 1 free bonus pick per roster every year since 2017. The blind model picks the top-projected name from the year's curated bonus pool.

Scoring + ranking: Each year's scoring rules (event multipliers + field bonuses from fantasy_event_entries). Final scores are ranked against Each year's actual ODB Fantasy entries — same teams the human owners submitted.

Excluded years: 2020, 2021 (WSOP cancelled / virtualized due to COVID.)

Why this is the honest backtest

Most projection models report flattering numbers because their projections were tuned with full knowledge of how players actually performed. Here, the 2025 lineup was picked using only 2022, 2023, and 2024 data. The 2024 lineup used 2021-2023. The 2017 lineup used 2014-2016. No back-fitting, no peeking at the answer key. The page you're reading was generated by hitting theblind-backtest endpoint — you can verify it in the open.

Year by year

The seven lineups. The seven verdicts.

Each card shows the lineup the model picked using only pre-year data, what each player projected to score, what they actually scored, and how that team finished against the human-submitted field.

Two Projectors, One Test

What if we used ridge regression — also strictly blind?

Marcel is a simple 3-year recency-weighted average. Critics will fairly ask: what about your actual production ridge model — does it win when you force it to be blind? We re-trained the ridge regression model seven times, once per test year, using only data from strictly prior years (the 2017 ridge model trained on 2011-2016; the 2025 ridge model trained on 2011-2024). No future leakage in either. Result: the simpler projector wins.

Marcel
★ Winner
3-year weighted average · no price feature · no training
3/7
Cash rate
0
Wins
+143%
ROI
$5K
Net
68%
Avg %ile
Ridge Regression (blind)
Re-trained per year on prior years only · Ridge regression (λ=100) re-trained for each test year using only data from strictly prior years. No future leakage.
1/7
Cash rate
0
Wins
+0%
ROI
$0
Net
36%
Avg %ile
Why the simpler model wins (and what it teaches us)

The ridge regression model uses draft_price as a feature — and the training signal “expensive players score more” holds in historical data. So when blind ridge meets a year where the most expensive picks bust (looking at you, 2025: Ausmus, Seiver, Schulman all underperformed), it walks straight into the trap. Marcel ignores price entirely and just trusts each player's last three years of fantasy points, which turns out to be more robust when the auction is overpricing chalk. This is the kind of finding you only see when you actually run the blind test, instead of grading your own homework with a model that's seen the answer key.

Ridge per-year detail
YearTrained onRankScoreMarcel rankResult
20176 yrs · 579 rows222/25552326/255miss
20187 yrs · 641 rows28/287998143/287$4K
20198 yrs · 759 rows283/47878216/478miss
20229 yrs · 854 rows346/433862308/433miss
202310 yrs · 962 rows311/59492283/594miss
202411 yrs · 1111 rows447/709834421/709miss
202512 yrs · 1257 rows858/873438148/873miss
Multi-Entry Portfolio Test

Three lineups, not one. Did diversification help?

The single-entry result above is volatile by design — one team, one shot at the field. Real fantasy contestants often run multiple lineups to smooth variance. So we tested a three-entry portfolio: a chalk lineup maximizing raw projection, a value lineup maximizing projection per dollar, and a contrarian lineup that excludes the top 25% most-projected names then maximizes from the remaining pool. All three use the same blind marcel projector. Each year costs $500 × 3 = $1,500 in entries.

$10,500
Total entry
3 entries × 7 yrs
$9,500
Total payout
3/7 yrs cashed something
-$1,000
Net
vs. K=1 chalk: see below
-10%
ROI
portfolio over 7 years
The honest answer

Diversification didn't help in our blind test. The chalk lineup alone went 3 cashes / +143% ROI at $500 entry × 7 years. Adding value + contrarian lineups added $1,000 in extra payout but cost $7,000 extra in entries — net negative. The value strategy in particular gets killed by stacking $1 sleepers who collectively underperform their modest projections; the contrarian strategy avoids chalk but doesn't reliably find the next breakout. A more sophisticated multi-entry strategy — e.g., Monte-Carlo correlation-minimizing portfolio construction with adjusted bonus picks — is the next iteration. For now, the single chalk lineup beats this naive diversification.

The Receipt, Stacked

Best year to worst.

2019
16th of 478·1,173 pts vs winner's 1,379
97%ile
$4K
2017
26th of 255·960 pts vs winner's 1,297
90%ile
$4K
2023
83rd of 594·1,157 pts vs winner's 1,813
86%ile
$1K
2018
143rd of 287·724 pts vs winner's 1,259
50%ile
miss
2025
148th of 873·1,156 pts vs winner's 1,615
83%ile
miss
2022
308th of 433·939 pts vs winner's 1,952
29%ile
miss
2024
421st of 709·850 pts vs winner's 1,679
41%ile
miss
What this isn't

Three caveats no honest researcher should skip.

1. The pool itself wasn't blind.

We used the actual ODB Fantasy player pool from each year — which the human-run contest curated based on their own forward-looking judgment of who'd be relevant. Our model didn't pick the universe; it picked within the universe. If the curators had a systematic blind spot (say, missing a hot mid-stakes pro who later broke out), our model inherited that blind spot.

2. One entry per year is high variance.

Real ODB Fantasy entries cluster among multiple lineups per owner — the big winners often have 3 to 10 different submissions. We're scoring a single optimal lineup against fields where the top finishers may have spent 10× entry fees. A 3-of-7 cash rate from a single annual entry is meaningfully better than coin-flip; multiple diversified entries would likely cash more often and crown rarely.

3. The scoring rules drift.

ODB introduced the bracelet bonus around 2025. Earlier years had a simpler structure. We apply each year's actual rules from fantasy_event_entries, but a meta-strategy that exploits the bracelet bonus (e.g., loading up on elite mixed-game specialists) only became correct in the last two years. Our marcel projector doesn't know about the rule change — it just projects fantasy points using the rules in effect at scoring time.

The takeaway

The model has real signal — it finishes in the top third of a 200-to-873 person field consistently — but it isn't a contest-winning machine on a single-entry basis. The realistic path to a crown is a diversified three-to-five entry portfolio, not a single optimum. That's the calibrated honest claim, and these receipts are how we get there.