Why a 1997 statistics paper still beats the experts at football scorelines
Mark Dixon and Stuart Coles fixed a small but ugly hole in the Poisson model of football, and 28 years later we still haven't found a better starting point.
By WorldSportsXAI Editorial
If you have ever tried to model football scorelines with a plain Poisson distribution, you have run into the same problem everyone else has: 0-0, 1-0, 0-1, and 1-1 happen more often in the real world than a simple Poisson model thinks they should. The model leaks probability mass into 2-1s and 1-2s, the scoreboard does not cooperate, and your forecasts come out a touch overconfident on attacking teams.
Mark Dixon and Stuart Coles published a paper in 1997 — Modelling Association Football Scores and Inefficiencies in the Football Betting Market — that did two things. First, they wrote down a small correction term τ that bumps the four low-score cells up or down to match observed frequencies. Second, they added an exponential time-decay weighting so the model pays attention to recent matches without throwing away older ones entirely. That is the entire idea. It is small. It works. Twenty-eight years later, "Dixon-Coles" is still the first model serious football modelers reach for.
The Poisson assumption, and where it breaks
The intuition behind Poisson scoring is reasonable. Goals are rare, roughly independent events that happen at some rate per minute, so the number of goals per match should follow a Poisson distribution with mean λ. If you also assume the two teams' scoring rates λ_h and λ_a are independent, then the full scoreline (X, Y) is a product of two Poisson distributions. Cheap to compute, easy to fit, easy to teach.
The problem is that real football matches are not actually independent in low-score outcomes. When a match is 0-0 in the 80th minute, both teams play differently than they would at 2-2. When one team is up 1-0 late, they defend deeper. The result is that 0-0 and 1-1 results are more frequent than independent Poisson predicts, and 2-1 / 1-2 results are slightly less frequent. The pattern shows up in every league we have data for, going back to the 1990s.
The fix, in one paragraph
Dixon and Coles' correction multiplies the joint Poisson probability by a small factor τ that depends only on the four low-score cells:
τ(0,0) = 1 - λ_h · λ_a · ρ τ(0,1) = 1 + λ_h · ρ τ(1,0) = 1 + λ_a · ρ τ(1,1) = 1 - ρ τ(x,y) = 1 otherwise
The parameter ρ is fit to data. In practice it lands around -0.05 for international football: 0-0 outcomes are slightly more common than independent Poisson says, and 1-1 outcomes are slightly less common. The numbers are small. The effect on forecast calibration is not. In our own fit, adding the τ correction improves Brier score by roughly 1.5% versus plain Poisson — modest but consistent across decades of matches.
Time decay: why old matches still matter, just less
The second Dixon-Coles innovation is a weighting scheme. Instead of treating every historical match equally, we weight each match by exp(-ξ · days_since_match). We use ξ = 0.0019, which means a match's weight halves every roughly 365 days. A friendly from 2014 still contributes information — just less than a UEFA Nations League fixture from last month.
Why bother with old data at all? Because international football is sparse. Many national teams play 10-12 fixtures per year. Drop everything before 2022 and you are fitting attack/defense ratings for 200+ teams on maybe 1,500 matches total, most of which are confederation-internal. The time-decay weighting buys you twenty years of history while still letting Spain's 2024 form dominate Spain's 2010 form.
What we did not change
We kept the bivariate Poisson core. We did not switch to a negative binomial, even though football scoring is mildly over-dispersed compared to Poisson, because the extra parameter does not buy meaningful calibration improvement at international level. We did not add a separate "shots-on-target" submodel — that is the right move for club football where shot data is dense and consistent, but international shot data is patchy. We did not add weather or pitch effects.
We did add one thing: the home-field advantage parameter γ is zeroed out for neutral-venue matches. Almost every World Cup fixture is neutral. If you forget this and treat the World Cup like a normal qualifier, you will systematically overrate the team listed as "home" in the fixture list — and the bookmakers will eat your lunch.
What's next
The base model handles head-to-head match probabilities. To get full tournament probabilities — "what is the chance Brazil makes the semifinals?" — you need to run the model forward through the bracket. We do that 10,000 times via Monte Carlo, sampling a score from the fitted distribution for each match and advancing the winner. That is where stage-reach probabilities come from on our predictions page.
The full Python implementation is open source on our GitHub. It is roughly 600 lines, mostly numpy and scipy. You can pull it, refit it on your own data, or argue with our choice of priors. We would rather you do that than take the model on faith.
Want the underlying numbers?
The model's live probabilities, mispricings vs. consensus markets, and calibration backtest are all on one page.
Open the model →