Weather Temperature Model

The Forecast Has a Price

How we harness the world's most powerful weather supercomputers, correct their blind spots, and find the moments when prediction markets get the weather wrong.

Every day, Polymarket runs prediction markets asking a deceptively simple question: "What will the high temperature be in [city] tomorrow?" Traders buy and sell shares on exact temperature brackets like "72°F" or "24°C," and the price of each share reflects the crowd's consensus probability that the temperature will land in that bracket.

The market is a roulette wheel with 20+ slots, except instead of random chance, the outcome is governed by atmospheric physics. The crowd places their chips. Our job is to figure out which slots are overpriced and which are underpriced.

We do not try to out-predict the weather. The world's meteorological agencies spend billions of dollars on that problem and they are very good at it. Instead, we stand on their shoulders: we ingest their ensemble forecasts, scrub out systematic errors, build calibrated probability distributions, and compare them against what the market is pricing. When the crowd disagrees with the physics, we act.

The Challenge

Why Temperature Markets Are Deceptively Complex

Predicting whether tomorrow will be "warm" or "cold" is easy. Your weather app does that just fine. But predicting that the high will land in a specific 1-degree bracket out of 20+ possible outcomes is a fundamentally different problem. It is the difference between guessing that a dart will hit the board versus calling the exact number it lands on. Most people can feel the direction. Almost nobody gets the precision right.

Bracket Precision

Markets resolve to exact degree brackets. Being off by just 1° means the difference between a winning and losing position.

Station Specificity

Markets resolve at a specific weather station, not a city average. Microclimates near airports can diverge from downtown conditions.

Forecast Bias

Raw weather models have systematic biases that vary by location, season, and model. Uncorrected forecasts lead to miscalibrated probabilities.

The opportunity: Most traders rely on a single weather app or rough estimates. We aggregate hundreds of ensemble members from multiple world-class forecast systems, correct their biases, and compute true probability distributions. When the crowd gets the shape of uncertainty wrong, the edge can be significant.

The Ensemble Engine

The Ensemble Advantage

Imagine you need to estimate the temperature tomorrow. You could ask one expert, but even the best meteorologist has blind spots. Now imagine assembling a jury of over 200 independent experts, each running their own physics simulation of the atmosphere with slightly different starting conditions. Some emphasize ocean currents. Others weight solar radiation or jet stream positioning differently. Each produces a slightly different forecast.

That is exactly how ensemble forecasting works. Each "juror" is a perturbed run of a full atmospheric simulation: same physics engine, different initial conditions. When the jurors agree, confidence is high. When they scatter, uncertainty is real. The shape of their disagreement is the signal we trade on.

Concept	Analogy
Ensemble member	One juror's verdict: informed but imperfect
Ensemble spread	How much the jury disagrees. Wide spread = high uncertainty
Weighted average	The final verdict, giving more weight to historically accurate jurors
Probability distribution	Not just the verdict, but how confident the jury is in each possible outcome

Key insight: A single forecast says "tomorrow will be 72°F." Our ensemble says "there is a 35% chance of 71-72°F, a 28% chance of 73-74°F, and a 15% chance of 69-70°F." That probability shape is what lets us compare against market prices and find mispricings.

Multiple Forecast Systems

We do not rely on a single weather model. We aggregate forecasts from multiple independent global forecast systems, each with different strengths. Some excel at short-range precision, others at capturing extreme scenarios.

Each system contributes dozens of ensemble members, giving us a rich picture of forecast uncertainty that no single model provides alone.

Weighted by Track Record

Not all forecast systems are created equal. Some consistently outperform others in specific contexts. We evaluate each system's track record and weight their contributions accordingly.

Like a fantasy lineup, we give more playing time to the most reliable performers, but keep the bench active for when conditions favor them.

Bias Correction

Every instrument has quirks. A bathroom scale that always reads 2 pounds heavy. A car speedometer that drifts above 60. Weather models are no different. They have systematic biases that skew their forecasts in predictable ways. The key word is predictable. A bias you can measure is a bias you can fix.

Calibrating the Instruments

But weather model biases are more nuanced than a bathroom scale. They vary across three dimensions simultaneously. A model that runs warm in summer might run cold in winter. The same model might be perfectly calibrated for coastal cities but consistently biased for inland ones. One forecast system might nail New York but struggle with Dallas. We learn these fingerprints individually and correct for each one.

Per-City

Each city's weather station has unique local effects that models handle differently

Per-Season

Winter biases differ from summer biases, so we train separate corrections for each

Per-Model

Each forecast system has its own fingerprint of biases that we learn independently

We maintain a comprehensive library of correction parameters tailored to each unique forecasting context. These are trained on historical forecast-versus-observation pairs and retrained periodically as conditions evolve.

Finding Value

Where Edges Come From

The Polymarket crowd is not wrong on average. Collective wisdom is powerful. But prediction markets are made by humans, and humans have well-documented blind spots when it comes to estimating probabilities. The same cognitive biases that make people buy lottery tickets and fear flying more than driving show up in temperature markets. We exploit the gap between what people feel and what the physics says.

Four recurring patterns create structural edges:

Probability mispricing: Traders tend to miscalibrate the likelihood of outcomes outside the consensus range. Our ensemble quantifies this uncertainty directly, revealing edges the crowd misses.
Recency bias: Markets sometimes overweight recent conditions and underweight how quickly weather patterns can shift. Our model captures the full range of possible outcomes.
Uncertainty underestimation: Traders often concentrate probability too tightly around a single expected outcome. Our proprietary approach directly measures the true width of the distribution.
Market inefficiency: Some markets are thinner and less efficient than others. Fewer participants means less competition and larger potential edges.

The Daily Cycle

From Data to Decision

Like a newsroom racing to deadline, our pipeline runs multiple cycles each morning, each one incorporating fresher data than the last. Early runs use overnight model outputs. Later runs layer in morning updates. You never see the drafts, only the final, locked edition.

3:00 AM ET

Overnight model runs ingested, bias corrections applied, initial probabilities computed

7:00 AM ET

Morning model runs update probabilities, market prices fetched, edges recalculated

10:00 AM ET

Picks lock and become visible. No changes after this point. Accountability is absolute

After lock: High-resolution monitoring continues for US cities, but locked picks are never modified. This ensures you always see the same picks we committed to before the outcome was known.

Global Coverage

10 Cities, 6 Continents

Each city resolves at a specific airport weather station. This matters. A station at a coastal airport may read differently from downtown conditions. Our bias corrections are trained station-specific.

City	Unit	Timezone
Chicago	°F	Central
Miami	°F	Eastern
Atlanta	°F	Eastern
Dallas	°F	Central
Seattle	°F	Pacific
Toronto	°C	Eastern
London	°C	GMT
Buenos Aires	°C	ART
Ankara	°C	TRT
Seoul	°C	KST

Seoul is highlighted because its timezone (UTC+9) means that at our 10 AM ET lock time, it is already midnight the next day locally. Our pipeline handles this edge case automatically.

How We Measure Success

In prediction markets, the ultimate test is not whether individual picks win. It is whether your probabilities are well-calibrated over hundreds of markets. Here is what we track:

Calibration Accuracy

When we say 30% probability, does that bracket resolve ~30% of the time? Perfect calibration means our confidence levels match real-world frequencies.

Brier Score

The gold standard for probability forecasts. Lower is better. It penalizes both overconfidence and underconfidence, rewarding precise probability estimates.

Profit & Loss

The bottom line. We track P&L for every locked pick across every city, measuring actual returns from the edges we identify.

Edge Consistency

Are edges randomly distributed, or do they cluster in specific cities, seasons, or bracket types? Consistent patterns validate the model; random streaks do not.

The Bottom Line

Weather prediction markets reward one thing above all else: being better calibrated than the crowd. We do not predict the weather. The world's supercomputers do that. What we do is translate their raw output into precise, station-calibrated probability distributions and find the moments where the market's pricing diverges from the physics. We are the referee, not the player. We do not make the weather. We call the score.

Aggregate the best forecasts: from multiple independent systems with hundreds of ensemble members
Correct systematic biases: per city, per season, per model, with hundreds of learned parameters
Build true probability distributions: not point forecasts, because markets trade on uncertainty
Act only with edge: skipping thin margins and illiquid brackets without hesitation
Track everything: calibration, Brier scores, and P&L with full transparency

The weather is chaotic. Markets are imperfect. Our job is to stand in the gap between what the physics says and what the crowd believes, and only step in when the numbers are on our side.

Technical Breakdown

A deeper look at how the pipeline transforms raw atmospheric simulations into tradeable picks, step by step, like an assembly line where each station adds precision.

Ensemble Ingestion

We pull ensemble forecasts from multiple independent global forecast systems, each running dozens of members with slightly perturbed initial conditions. For each member, we extract the daily maximum temperature at the target station using only native-resolution timesteps. No interpolated fill, only real model output.

for each model → for each member → extract daily_max(target_date, station_timezone)

Bias Correction

Every raw daily max passes through a learned correction function tailored to its specific forecasting context. The result: hundreds of bias-corrected temperatures calibrated to the specific location and time of year.

corrected = correction_function(raw_daily_max, context)

Probability Distribution

The corrected samples are fed into a proprietary probability engine. More accurate forecast systems get higher weights, so their members pull the probability curve toward them, like giving the best jurors louder voices.

200+ Members

Aggregated from multiple forecast systems for robust uncertainty estimation

Weighted Analysis

Higher-accuracy systems contribute more to the final probability output

Tail Handling

Explicit lower and upper tail brackets capture extreme temperature scenarios

The output is a complete probability distribution across all possible temperature brackets, with built-in safeguards to prevent any outcome from being rated as impossible. Probabilities are normalized and passed through a calibration layer that corrects any remaining systematic bias.

Edge Detection

Model brackets are aligned to Polymarket's bracket labels. For each matched bracket, we compute:

edge = model_probability − market_probability

Positive edge → buy YES (model says more likely than the market prices). Negative edge → buy NO (model says less likely). Only edges exceeding the configured threshold survive to the next stage.

Quality Gates

Every pick must survive a gauntlet of filters before surfacing. Every filter is mandatory.

Minimum absolute edge threshold exceeded
Market liquidity above configured minimum
Valid, non-stale market price data available
Per-city confidence weight above zero
Raw model probability below guard-rail ceiling

Ranking & Selection

Surviving candidates are scored using a proprietary composite metric that balances multiple factors including edge strength, historical model confidence, and market conditions. Picks are then sorted by score and capped, both per-city and globally, to enforce diversification and prevent overexposure to any single market.

pick_score = composite(edge, confidence, market_quality)

The highest-scoring picks are surfaced. The rest are logged with rejection reasons for post-cycle analysis.

Fault Isolation

Each city is processed independently, like watertight compartments on a ship. If a data source fails for one city, the rest continue normally. Persistence is split into two phases: irreplaceable forecast data is saved before pick generation. If pick selection fails, the raw data survives and picks can be regenerated. The same applies at the model level: if one forecast system is unavailable, the remaining systems still produce a valid distribution.

Monthly Access

$25/month

Predictions only go live when the model finds true edge
Closing line value tracked on every prediction so you can verify it yourself
Covers every market we model and we're always adding more
Cheaper than your average unit size

Annual Access

$200/year

Get 4 months free on us when you go annual
Every new model we ship is included automatically
Full platform access for less than most services charge monthly
Models run 365 days, your subscription should too