MLB Game Totals Model

Counting Runs Before the First Pitch

Our MLB Game Totals model builds a probability distribution over every possible final score, then derives the true edge against the over/under line. We don't guess at run totals. We simulate the game from the inside out.

Think of an insurance underwriter pricing a homeowner's policy. They don't look at last year's total claims and call it a day. They look at the property, the wind zone, the roof age, the credit history, the proximity to a fire station, and they build a probability distribution over how much the insurer might have to pay out. The premium is set so that, across thousands of similar policies, the math works in their favor.

Our MLB Game Totals model works the same way. The market posts a number (say, 8.5 runs) and asks you to bet whether the game lands over or under. Our job is to build the most accurate possible probability distribution for tonight's specific game, then take the side where the market has mispriced the risk. This page explains how we get there.

The Challenge

Why MLB Totals Are Hard

Baseball produces 162 games per team per season, more data than any other major sport, but each game is also more variable than people realize. A single home run can swing a total by four runs. A bullpen meltdown in the seventh inning can flip an under into an over before the closer ever warms up. A 15 mph wind blowing out at Wrigley turns a routine fly ball into a souvenir. The book has to price all of that into one number.

Lumpy Outcomes

Runs come in discrete clumps: solo shots, two-out rallies, six-run innings. Smooth bell-curve thinking misses the way scoring actually distributes around the average.

Park & Weather

Coors Field plays nothing like Petco. Wind out vs. wind in changes a stadium's effective scoring environment dramatically. The book uses a flat park factor; we don't.

Three-Phase Game

Starters set the tone. Bullpens decide the middle. Closers shape the end. Each phase scores at a different rate, and you can't price a total without modeling all three separately.

The opportunity: Most MLB total lines start from a generic park-adjusted scoring expectation and then move with action. Our model rebuilds tonight's scoring environment from the inside (pitcher, lineup, weather, umpire) and computes the full probability of every possible run total. The gaps between that distribution and the market's implied probability are where the edge lives.

The Projection Engine

A Distribution, Not a Number

Picture a chef plating a thousand identical orders. Even with the same recipe and the same ingredients, no two plates are ever exactly the same; there's always a slight variation in how a sauce pools or a garnish lands. If you wanted to price a guarantee on the dish's weight, you wouldn't use the average; you'd use the full distribution of plate weights so you knew the probability of any given plate hitting any given threshold. Run scoring works the same way. Two teams with the same expected runs can produce wildly different actual totals on the night, so the model doesn't stop at an expected value; it builds the entire distribution.

The engine runs in two stages, one team at a time. Stage one estimates how many runs each team is expected to score in this specific matchup, blending stable identity (season talent) with the latest signals (recent form, lineup, opponent). Stage two takes that expectation and spreads it across the full range of possible outcomes (the probability of 0 runs, 1 run, 2 runs, all the way up) using a distribution shape calibrated to how baseball actually scores.

Stage	What It Does
Stage 1: Expected Runs	For each team, blend talent and form into an expected run count for tonight's specific matchup. Inputs flow in from the data pipeline (next section).
Stage 2: Run Distribution	Convert that expectation into a full probability distribution over possible run totals. Captures the lumpy, asymmetric nature of how runs cluster.
Combine Teams	Merge the home and away distributions to produce the joint probability of every possible game total. The over/under probability falls out cleanly from this.
Calibrate	Map raw probabilities through a calibration layer trained on settled outcomes so the numbers we output match the rates we actually hit historically.

Key insight: The book sets one line. We compute thousands of probabilities and then check the only one that matters: does the market's number land where our distribution thinks it should? When it doesn't, we have edge.

Starter Profile

Tonight's starter is the single biggest variable in any MLB game. The model evaluates each pitcher across their full season profile, recent form, and platoon-adjusted matchup against the opposing lineup.

A flame-thrower in his fourth start back from injury is a different bet than the same pitcher in May. The model captures the trajectory, not just the season line.

Bullpen Quality

A great starter can be undone by a bullpen melting down in the sixth. The model factors in each team's relief depth, recent usage, and projected workload to estimate how the back half of the game will score.

Teams with overworked bullpens after a doubleheader play very differently than teams with five days of rest behind their relievers.

Lineup Quality

A lineup is more than nine names. The model evaluates the projected batting order's recent and seasonal offensive output, along with platoon advantages against tonight's starter. Late lineup scratches and rest days are picked up before the market adjusts.

Game Environment

Park factors, weather (temperature, wind direction and speed, humidity), and umpire tendencies all shape the run-scoring environment. The model treats these as multipliers on top of the talent matchup, not arbitrary nudges.

Finding Value

Where Edges Come From

Books set MLB total lines from a mix of generic scoring expectations and market action. They're sophisticated, but they can't individually price every park-weather-pitcher-lineup combination on a 15-game slate every night. We look for specific gaps:

Starter mispricing: The book line reflects a starter's headline ERA, but the model sees a meaningful gap in underlying performance, recent form, or matchup-specific platoon splits.
Bullpen leverage: One team's relievers are markedly fresher or sharper than the other's, and the line hasn't fully priced in the late-game disparity.
Weather divergence: Wind direction and speed have shifted since the line was posted, or temperature is materially different from the seasonal park average baked into the book's adjustment.
Distribution shape: Two games with the same expected runs can have very different probabilities of hitting a 9.0 line because of how the distribution clusters. The model reads the shape, not just the mean.

Model vs. Market

For every line on the board, the model outputs P(over) and P(under). We strip the bookmaker's margin from both sides to get a fair implied probability. The gap between our calibrated probability and the fair probability is the edge. If a book prices OVER 8.5 at -110 (fair implied ~52.4%), and the model says the true P(OVER) is 58.7%, the edge is +6.3%.

Important: Not every edge survives the gate stack. The model enforces minimum edge thresholds, vig ceilings, and odds-band filters. Only picks where the probability advantage is meaningful and the market structure is healthy make it through.

The Confidence System

Multi-Gate Quality Control

Like a TSA line for picks, every candidate has to pass several independent checkpoints before it can board the slate. Failing any one of them is enough to drop it.

Gate 1: Edge Threshold

Calibrated probability must beat the de-vigged fair probability by a meaningful margin. Marginal edges are rejected; the vig eats them.

Gate 2: Vig Ceiling

Markets with excessive bookmaker margin are excluded. If the juice is wide enough, even a real probability advantage doesn't pay.

Gate 3: Odds Band

The model is most reliable inside a validated odds range. Outside that range, conviction is dampened or the pick is dropped entirely.

Gate 4: Slate Top Six

Of every candidate that survives the gates, only the six highest-conviction picks per slate surface to subscribers. One pick per game maximum.

Edge-Ranked, Tier-Sized

Picks that survive the gates are ranked by a value/edge blend. The top six are assigned to confidence tiers that determine unit sizing; higher-edge picks get bigger allocations, tier-bottom picks get smaller ones.

Metric	How It Works
Edge	Calibrated model probability minus the de-vigged fair probability. The cleanest measure of how much the market has mispriced a side.
EV	Expected value per unit, computed from the model probability and the actual American odds offered.
Tier	Confidence band derived from edge strength. Drives unit allocation across the slate.

Important: Individual picks still lose. MLB game totals are inherently noisy because of how lumpy run scoring is. The model is built for expected value over hundreds of bets, not certainty on any single one.

How We Measure Success

Anyone can post a hot weekend. We care about what holds up across hundreds of picks. Here's what we track and why it matters:

Win Rate by Tier

Higher-conviction picks should win at higher rates. Each tier is tracked separately to validate that conviction maps to outcomes.

Return on Investment

Win rate alone is misleading on plus-money totals. ROI captures the price you got, not just whether you cashed.

Closing Line Value

Did the line move in our direction after lock? Consistent positive CLV is the surest sign that the model is reading the market correctly.

Sample Size

20 picks tells you nothing. Hundreds of picks reveal whether the edge is structural or just variance.

Closing Line Value (CLV)

CLV measures whether the line moved toward us after we locked in. If we bet UNDER 8.5 at -105 and the line closes at -120, the market confirmed our read. We got a better price than the eventual consensus.

Sportsbooks use CLV to identify their sharpest customers. Consistent positive CLV is the single strongest predictor of long-term profitability; it means we're consistently a step ahead of the line.

The Bottom Line

MLB scoring is noisy, but it isn't random. Park, weather, starters, bullpens, and lineups push the run distribution in predictable ways every night. Our model does the heavy lifting of analyzing each factor for every game on every slate, but the philosophy is simple:

Build distributions, not point estimates: the full probability surface over every possible run total, not just an expected value
Model the environment, not the average: park, weather, umpire, starter, lineup, bullpen, each layered as a multiplier on the talent matchup
Filter ruthlessly: edge thresholds, vig ceilings, odds bands, and a hard slate cap of six picks per night, one per game
Calibrate against reality: raw probabilities are mapped through a calibration layer trained on settled results
Track everything: every locked pick is graded against final scores and tracked against closing lines

The scoreboard shows runs. Our job is to know the shape of the distribution before the first pitch is thrown.

Technical Breakdown

For the quantitative readers: a deeper look at the architecture without the proprietary internals.

Data Pipeline

Component	Description
Pitcher Profiles	Season-long and trailing-window quality metrics for every active starter and relief arm, blended across multiple time horizons
Lineup Quality	Projected batting orders, platoon-adjusted offensive metrics, recent form, and lineup-injury context
Park Factors	Per-stadium scoring environment, separated by handedness and approach, refreshed seasonally
Weather Feed	Game-time temperature, wind speed and direction, humidity, and precipitation chance for every venue
Umpire Context	Plate umpire tendencies: strike-zone size, called-strike rate over recent windows
Live Odds	Total-runs odds aggregated from multiple US sportsbooks in real time, used for both edge calculation and line-movement tracking

Projection Pipeline

Step	Process
1. Per-Team Expectation	Stage 1 model produces an expected run count for each team, blending talent and form with matchup adjustments
2. Per-Team Distribution	Stage 2 spreads each expectation across a calibrated run-count distribution
3. Game Distribution	Combine team distributions to produce the joint probability of every possible game total
4. Calibration	Map raw P(over) / P(under) through a calibration layer trained on settled outcomes
5. Edge Analysis	Compare to de-vigged market probability, compute EV, run the multi-gate filter stack
6. Slate Ranking	Rank surviving candidates and surface the top six for the slate, one per game

Daily Schedule (ET)

Time	Job
Morning	Settlement: previous slate graded against verified final scores
Midday	Projection: full pipeline runs against today's weather, lineups, and odds snapshot
11:00 AM & 4:35 PM	Two fixed lock windows: afternoon games lock at 11:00 AM, the main evening slate at 4:35 PM. Once locked, the line and odds are final for the slate.
Throughout the day	CLV tracking: locked picks compared to live and closing odds
Weekly	Calibration retrained from rolling settled-outcome window

Risk Management

The model includes automated guardrails: drawdown circuit breakers, drift detection across rolling result windows, and odds-band-specific performance gating. If the model's recent behavior diverges from historical patterns, position sizing throttles automatically before the issue compounds.

Every pick has a complete paper trail: model probability, locked odds, closing odds, actual result, and realized profit. No manual intervention, no cherry-picking. You can audit the full history.

Monthly Access

FREE

for 7 days, then $25/month

Predictions only go live when the model finds true edge
Closing line value tracked on every prediction so you can verify it yourself
Covers every market we model and we're always adding more
Exact play, price, and unit size on every prediction