The Forecast Has a Price
How we harness the world's most powerful weather supercomputers, correct their blind spots, and find the moments when prediction markets get the weather wrong.
Every day, Polymarket runs prediction markets asking a deceptively simple question: "What will the high temperature be in [city] tomorrow?" Traders buy and sell shares on exact temperature brackets like "72°F" or "24°C," and the price of each share reflects the crowd's consensus probability that the temperature will land in that bracket.
The market is a roulette wheel with 20+ slots, except instead of random chance, the outcome is governed by atmospheric physics. The crowd places their chips. Our job is to figure out which slots are overpriced and which are underpriced.
We do not try to out-predict the weather. The world's meteorological agencies spend billions of dollars on that problem and they are very good at it. Instead, we stand on their shoulders: we ingest their ensemble forecasts, scrub out systematic errors, build calibrated probability distributions, and compare them against what the market is pricing. When the crowd disagrees with the physics, we act.
The Challenge
Why Temperature Markets Are Deceptively Complex
Predicting whether tomorrow will be "warm" or "cold" is easy. Your weather app does that just fine. But predicting that the high will land in a specific 1-degree bracket out of 20+ possible outcomes is a fundamentally different problem. It is the difference between guessing that a dart will hit the board versus calling the exact number it lands on. Most people can feel the direction. Almost nobody gets the precision right.
Bracket Precision
Markets resolve to exact degree brackets. Being off by just 1° means the difference between a winning and losing position.
Station Specificity
Markets resolve at a specific weather station, not a city average. Microclimates near airports can diverge from downtown conditions.
Forecast Bias
Raw weather models have systematic biases that vary by location, season, and model. Uncorrected forecasts lead to miscalibrated probabilities.
The opportunity: Most traders rely on a single weather app or rough estimates. We aggregate hundreds of ensemble members from multiple world-class forecast systems, correct their biases, and compute true probability distributions. When the crowd gets the shape of uncertainty wrong, the edge can be significant.
The Ensemble Engine
The Ensemble Advantage
Imagine you need to estimate the temperature tomorrow. You could ask one expert, but even the best meteorologist has blind spots. Now imagine assembling a jury of over 200 independent experts, each running their own physics simulation of the atmosphere with slightly different starting conditions. Some emphasize ocean currents. Others weight solar radiation or jet stream positioning differently. Each produces a slightly different forecast.
That is exactly how ensemble forecasting works. Each "juror" is a perturbed run of a full atmospheric simulation: same physics engine, different initial conditions. When the jurors agree, confidence is high. When they scatter, uncertainty is real. The shape of their disagreement is the signal we trade on.
| Concept | Analogy |
|---|---|
| Ensemble member | One juror's verdict: informed but imperfect |
| Ensemble spread | How much the jury disagrees. Wide spread = high uncertainty |
| Weighted average | The final verdict, giving more weight to historically accurate jurors |
| Probability distribution | Not just the verdict, but how confident the jury is in each possible outcome |
Key insight: A single forecast says "tomorrow will be 72°F." Our ensemble says "there is a 35% chance of 71-72°F, a 28% chance of 73-74°F, and a 15% chance of 69-70°F." That probability shape is what lets us compare against market prices and find mispricings.
Multiple Forecast Systems
We do not rely on a single weather model. We aggregate forecasts from multiple independent global forecast systems, each with different strengths. Some excel at short-range precision, others at capturing extreme scenarios.
Each system contributes dozens of ensemble members, giving us a rich picture of forecast uncertainty that no single model provides alone.
Weighted by Track Record
Not all forecast systems are created equal. Some consistently outperform others in specific contexts. We evaluate each system's track record and weight their contributions accordingly.
Like a fantasy lineup, we give more playing time to the most reliable performers, but keep the bench active for when conditions favor them.
Bias Correction
Every instrument has quirks. A bathroom scale that always reads 2 pounds heavy. A car speedometer that drifts above 60. Weather models are no different. They have systematic biases that skew their forecasts in predictable ways. The key word is predictable. A bias you can measure is a bias you can fix.
Calibrating the Instruments
But weather model biases are more nuanced than a bathroom scale. They vary across three dimensions simultaneously. A model that runs warm in summer might run cold in winter. The same model might be perfectly calibrated for coastal cities but consistently biased for inland ones. One forecast system might nail New York but struggle with Dallas. We learn these fingerprints individually and correct for each one.
Each city's weather station has unique local effects that models handle differently
Winter biases differ from summer biases, so we train separate corrections for each
Each forecast system has its own fingerprint of biases that we learn independently
We maintain a comprehensive library of correction parameters tailored to each unique forecasting context. These are trained on historical forecast-versus-observation pairs and retrained periodically as conditions evolve.
Finding Value
Where Edges Come From
The Polymarket crowd is not wrong on average. Collective wisdom is powerful. But prediction markets are made by humans, and humans have well-documented blind spots when it comes to estimating probabilities. The same cognitive biases that make people buy lottery tickets and fear flying more than driving show up in temperature markets. We exploit the gap between what people feel and what the physics says.
Four recurring patterns create structural edges:
- Probability mispricing: Traders tend to miscalibrate the likelihood of outcomes outside the consensus range. Our ensemble quantifies this uncertainty directly, revealing edges the crowd misses.
- Recency bias: Markets sometimes overweight recent conditions and underweight how quickly weather patterns can shift. Our model captures the full range of possible outcomes.
- Uncertainty underestimation: Traders often concentrate probability too tightly around a single expected outcome. Our proprietary approach directly measures the true width of the distribution.
- Market inefficiency: Some markets are thinner and less efficient than others. Fewer participants means less competition and larger potential edges.
The Daily Cycle
From Data to Decision
Like a newsroom racing to deadline, our pipeline runs multiple cycles each morning, each one incorporating fresher data than the last. Early runs use overnight model outputs. Later runs layer in morning updates. You never see the drafts, only the final, locked edition.
3:00 AM ET
Overnight model runs ingested, bias corrections applied, initial probabilities computed
7:00 AM ET
Morning model runs update probabilities, market prices fetched, edges recalculated
10:00 AM ET
Picks lock and become visible. No changes after this point. Accountability is absolute
After lock: High-resolution monitoring continues for US cities, but locked picks are never modified. This ensures you always see the same picks we committed to before the outcome was known.
Global Coverage
10 Cities, 6 Continents
Each city resolves at a specific airport weather station. This matters. A station at a coastal airport may read differently from downtown conditions. Our bias corrections are trained station-specific.
| City | Unit | Timezone |
|---|---|---|
| Chicago | °F | Central |
| Miami | °F | Eastern |
| Atlanta | °F | Eastern |
| Dallas | °F | Central |
| Seattle | °F | Pacific |
| Toronto | °C | Eastern |
| London | °C | GMT |
| Buenos Aires | °C | ART |
| Ankara | °C | TRT |
| Seoul | °C | KST |
Seoul is highlighted because its timezone (UTC+9) means that at our 10 AM ET lock time, it is already midnight the next day locally. Our pipeline handles this edge case automatically.
How We Measure Success
In prediction markets, the ultimate test is not whether individual picks win. It is whether your probabilities are well-calibrated over hundreds of markets. Here is what we track:
Calibration Accuracy
When we say 30% probability, does that bracket resolve ~30% of the time? Perfect calibration means our confidence levels match real-world frequencies.
Brier Score
The gold standard for probability forecasts. Lower is better. It penalizes both overconfidence and underconfidence, rewarding precise probability estimates.
Profit & Loss
The bottom line. We track P&L for every locked pick across every city, measuring actual returns from the edges we identify.
Edge Consistency
Are edges randomly distributed, or do they cluster in specific cities, seasons, or bracket types? Consistent patterns validate the model; random streaks do not.
The Bottom Line
Weather prediction markets reward one thing above all else: being better calibrated than the crowd. We do not predict the weather. The world's supercomputers do that. What we do is translate their raw output into precise, station-calibrated probability distributions and find the moments where the market's pricing diverges from the physics. We are the referee, not the player. We do not make the weather. We call the score.
- Aggregate the best forecasts: from multiple independent systems with hundreds of ensemble members
- Correct systematic biases: per city, per season, per model, with hundreds of learned parameters
- Build true probability distributions: not point forecasts, because markets trade on uncertainty
- Act only with edge: skipping thin margins and illiquid brackets without hesitation
- Track everything: calibration, Brier scores, and P&L with full transparency
The weather is chaotic. Markets are imperfect. Our job is to stand in the gap between what the physics says and what the crowd believes, and only step in when the numbers are on our side.
Technical Breakdown
A deeper look at how the pipeline transforms raw atmospheric simulations into tradeable picks, step by step, like an assembly line where each station adds precision.
Ensemble Ingestion
We pull ensemble forecasts from multiple independent global forecast systems, each running dozens of members with slightly perturbed initial conditions. For each member, we extract the daily maximum temperature at the target station using only native-resolution timesteps. No interpolated fill, only real model output.
Bias Correction
Every raw daily max passes through a learned correction function tailored to its specific forecasting context. The result: hundreds of bias-corrected temperatures calibrated to the specific location and time of year.
Probability Distribution
The corrected samples are fed into a proprietary probability engine. More accurate forecast systems get higher weights, so their members pull the probability curve toward them, like giving the best jurors louder voices.
Aggregated from multiple forecast systems for robust uncertainty estimation
Higher-accuracy systems contribute more to the final probability output
Explicit lower and upper tail brackets capture extreme temperature scenarios
The output is a complete probability distribution across all possible temperature brackets, with built-in safeguards to prevent any outcome from being rated as impossible. Probabilities are normalized and passed through a calibration layer that corrects any remaining systematic bias.
Edge Detection
Model brackets are aligned to Polymarket's bracket labels. For each matched bracket, we compute:
Positive edge → buy YES (model says more likely than the market prices). Negative edge → buy NO (model says less likely). Only edges exceeding the configured threshold survive to the next stage.
Quality Gates
Every pick must survive a gauntlet of filters before surfacing. Every filter is mandatory.
- Minimum absolute edge threshold exceeded
- Market liquidity above configured minimum
- Valid, non-stale market price data available
- Per-city confidence weight above zero
- Raw model probability below guard-rail ceiling
Ranking & Selection
Surviving candidates are scored using a proprietary composite metric that balances multiple factors including edge strength, historical model confidence, and market conditions. Picks are then sorted by score and capped, both per-city and globally, to enforce diversification and prevent overexposure to any single market.
The highest-scoring picks are surfaced. The rest are logged with rejection reasons for post-cycle analysis.
Fault Isolation
Each city is processed independently, like watertight compartments on a ship. If a data source fails for one city, the rest continue normally. Persistence is split into two phases: irreplaceable forecast data is saved before pick generation. If pick selection fails, the raw data survives and picks can be regenerated. The same applies at the model level: if one forecast system is unavailable, the remaining systems still produce a valid distribution.
Monthly Access
- Predictions only go live when the model finds true edge
- Closing line value tracked on every prediction so you can verify it yourself
- Covers every market we model and we're always adding more
- Cheaper than your average unit size
Annual Access
- Get 4 months free on us when you go annual
- Every new model we ship is included automatically
- Full platform access for less than most services charge monthly
- Models run 365 days, your subscription should too