John Shedletsky
John Shedletsky

Reputation: 7168

Algorithm to calculate the odds of a team winning a sports match given full history

Assumptions:

For example:

I have a long list of match outcomes that look like this:

Team A beats Team B
Team B beats Team A
Team A beats Team B
Team C beats Team A
Team A beats Team C

Problem:

Predict the correct betting odds of any team beating any other team.

In the example above, maybe we conclude that A should beat B 66% of the time. That is based off direct observation and is pretty straightforward. However, finding the probability that C beats B seems harder. They've never played together, yet it seems like most likely that C > B, with some low confidence.

Research I've Done:

I've read a fair bit about different ranking systems for games of skill, such as the Elo and Glicko rating systems for Chess. These fall short because they make assumptions about the probability distributions involved. For example, Elo's central assumption was that the chess performance of each player in each game is a normally distributed random variable. However, according to wikipedia, there are other distributions that fit the existing data better.

I don't want to assume a distribution. It seems to me that with 10,000+ match results on hand that I should be able to either deduce the distribution from the evidence (I don't know how to do this), or use some sort of reinforcement learning scheme that doesn't care what the distribution is.

Upvotes: 8

Views: 33466

Answers (7)

Steve Stokes
Steve Stokes

Reputation: 1230

Ratings Percentage Index - implemented in C# below:

// <copyright file="RPICalculator.cs" company="Steve Stokes Consulting, LLC">
//  Copyright © Steve Stokes Consulting, LLC. All Rights Reserved.
// </copyright>

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace RPI.Calculator
{
    public class RPICalculator
    {
        public void Test()
        {
            //Home      Score   Away        Score
            //UConn     64      Kansas      57
            //UConn     82      Duke        68
            //Minnesota 71      UConn       72
            //Kansas    69      UConn       62
            //Duke      81      Minnesota   70
            //Minnesota 52      Kansas      62

            //resulting in the following ratings:
            //UConn: 0.6660
            //Kansas: 0.6378
            //Duke: 0.3840
            //Minnesota: 0.3403

            List<Team> Teams = new List<Team>();
            List<Match> Matches = new List<Match>();

            Teams.Add(new Team()
            {
                Name = "UConn",
            });
            Teams.Add(new Team()
            {
                Name = "Kansas",
            });
            Teams.Add(new Team()
            {
                Name = "Duke",
            });
            Teams.Add(new Team()
            {
                Name = "Minnesota",
            });

            Matches.Add(new Match()
            {
                HomeTeam = Teams.Where(t => t.Name == "UConn").First(),
                AwayTeam = Teams.Where(t => t.Name == "Kansas").First(),
                Winner = Teams.Where(t => t.Name == "UConn").First(),
            });
            Matches.Add(new Match()
            {
                HomeTeam = Teams.Where(t => t.Name == "UConn").First(),
                AwayTeam = Teams.Where(t => t.Name == "Duke").First(),
                Winner = Teams.Where(t => t.Name == "UConn").First(),
            });
            Matches.Add(new Match()
            {
                HomeTeam = Teams.Where(t => t.Name == "Minnesota").First(),
                AwayTeam = Teams.Where(t => t.Name == "UConn").First(),
                Winner = Teams.Where(t => t.Name == "UConn").First(),
            });
            Matches.Add(new Match()
            {
                HomeTeam = Teams.Where(t => t.Name == "Kansas").First(),
                AwayTeam = Teams.Where(t => t.Name == "UConn").First(),
                Winner = Teams.Where(t => t.Name == "Kansas").First(),
            });
            Matches.Add(new Match()
            {
                HomeTeam = Teams.Where(t => t.Name == "Duke").First(),
                AwayTeam = Teams.Where(t => t.Name == "Minnesota").First(),
                Winner = Teams.Where(t => t.Name == "Duke").First(),
            });
            Matches.Add(new Match()
            {
                HomeTeam = Teams.Where(t => t.Name == "Minnesota").First(),
                AwayTeam = Teams.Where(t => t.Name == "Kansas").First(),
                Winner = Teams.Where(t => t.Name == "Kansas").First(),
            });

            var results = Calculate(Teams, Matches, Sport.NCAA_Basketball);

            foreach (var team in results)
            {
                Debug.WriteLine(string.Format("{0} - {1}", team.Name.PadRight(Teams.Select(t => t.Name).Max(s => s.Length)), team.RPI));
            }
        }

        private decimal defaultHomeWinValue = 1.0m;
        private decimal defaultAwayWinValue = 1.0m;
        private decimal homeWinValue = 1.0m;
        private decimal awayWinValue = 1.0m;

        public IEnumerable<TTeam> Calculate<TTeam, TMatch>(IEnumerable<TTeam> Teams, IEnumerable<TMatch> Matches, Sport Sport)
        {
            // TODO: transform a user team to our internal team type
        }

        /// <summary>
        /// Calculate the RPI of each team
        /// </summary>
        /// <param name="Teams">The list of teams to calculate RPI for</param>
        /// <param name="Matches">The list of matches and results of the matches the teams played in a period</param>
        /// <param name="Sport">The sport the teams played - this modifies home/away weighting based on NCAA rules</param>
        /// <returns>The list of teams with calculated RPI's</returns>
        public IEnumerable<Team> Calculate(IEnumerable<Team> Teams, IEnumerable<Match> Matches, Sport Sport)
        {
            SetWeightingBasedOnSport(Sport);

            foreach (var team in Teams)
            {
                // calculate the WP of each team
                team.WP = CalculateWinPercent(team, Matches, homeWinValue, awayWinValue);

                // calculate the OWP of each team
                team.OWP = CalculateOpponentsWinPercent(team, Matches);

                // calculate the OOWP of each team
                team.OOWP = CalculateOpponentsOpponentsWinPercent(team, Teams, Matches);

                // calculate the RPI for each team
                team.RPI = CalculateRPI(team);
            }

            return Teams.OrderByDescending(t => t.RPI);
        }

        private decimal CalculateRPI(Team team)
        {
            //RPI = (WP * 0.25) + (OWP * 0.50) + (OOWP * 0.25)

            //UConn: 0.6660
            //Kansas: 0.6378
            //Duke: 0.3840
            //Minnesota: 0.3403

            var RPI = (team.WP * 0.25m) + (team.OWP * 0.50m) + (team.OOWP * 0.25m);

            return Math.Round(RPI, 4);
        }

        private decimal CalculateOpponentsOpponentsWinPercent(Team teamInQuestion, IEnumerable<Team> Teams, IEnumerable<Match> Matches)
        {
            //UConn: ((Kansas 0.6667) + (Kansas 0.6667) + (Duke 0.3333) + (Minnesota 0.3889)) / (4 games) = 0.5139
            //Kansas: ((UConn 0.7500) + (UConn 0.7500) + (Minnesota 0.3889)) / (3 games) = 0.6296
            //Duke: ((UConn 0.7500) + (Minnesota 0.3889)) / (2 games) = 0.5694
            //Minnesota: ((UConn 0.7500) + (Duke 0.3333) + (Kansas 0.6667)) / (3 games) = 0.5833

            // get each team i've played this season (not unique)
            var teamsIvePlayed = Matches.Where(m => m.AwayTeam == teamInQuestion || m.HomeTeam == teamInQuestion).Select(s => s.HomeTeam == teamInQuestion ? s.AwayTeam : s.HomeTeam);

            // get the opponent win percent (OWP) of each team I played
            var teamsIvePlayedOpponentWinPercent = teamsIvePlayed.Select(t => new { Team = t, OWP = CalculateOpponentsWinPercent(t, Matches) });

            // calculate the OOWP
            return (decimal)(teamsIvePlayedOpponentWinPercent.Sum(t => t.OWP) / teamsIvePlayed.Count());
        }

        private decimal CalculateOpponentsWinPercent(Team teamInQuestion, IEnumerable<Match> Matches)
        {
            // get each teams WP without the team in question

            //Home      Score   Away        Score
            //UConn     64      Kansas      57
            //UConn     82      Duke        68
            //Minnesota 71      UConn       72
            //Kansas    69      UConn       62
            //Duke      81      Minnesota   70
            //Minnesota 52      Kansas      62

            //UConn: ((Kansas 1.0) + (Kansas 1.0) + (Duke 1.0) + (Minnesota 0)) / (4 games) = 0.7500
            //Kansas: ((UConn 1.0) + (UConn 1.0) + (Minnesota 0.0)) / (3 games) = 0.6667
            //Duke: ((UConn 0.6667) + (Minnesota 0.0)) / (2 games) = 0.3333
            //Minnesota: ((UConn 0.6667) + (Duke 0.0) + (Kansas 0.5)) / (3 games) = 0.3889

            // get each team i've played this season (not unique)
            var teamsIvePlayed = Matches.Where(m => m.AwayTeam == teamInQuestion || m.HomeTeam == teamInQuestion).Select(s => s.HomeTeam == teamInQuestion ? s.AwayTeam : s.HomeTeam);

            // get the win percent of each team I played excluding matches with me
            var teamsIvePlayedWinPercent = teamsIvePlayed.Select(t => new 
            { 
                Team = t, 
                WP = CalculateWinPercent(t, Matches.Where(m => m.AwayTeam != teamInQuestion && m.HomeTeam != teamInQuestion), defaultHomeWinValue, defaultAwayWinValue) 
            });

            // calculate the OWP
            return (decimal)(teamsIvePlayedWinPercent.Sum(t => t.WP) / teamsIvePlayed.Count());
        }

        private decimal CalculateWinPercent(Team teamInQuestion, IEnumerable<Match> Matches, decimal HomeWinValue, decimal AwayWinValue)
        {
            // get the teams win percent - sometimes weighted based on NCAA rules

            //UConn: (0.6 + 0.6 + 1.4 + 0) / (0.6 + 0.6 + 1.4 + 1.4) = 0.6500
            //Kansas: (0 + 0.6 + 1.4) / (1.4 + 0.6 + 1.4) = 0.5882
            //Duke: (0 + 0.6) / (1.4 + 0.6) = 0.3000
            //Minnesota: (0 + 0 + 0) / (0.6 + 1.4 + 0.6) = 0.0000

            // get my wins and sum with weighting
            var wins = Matches.Where(m => m.Winner == teamInQuestion).Sum(s => s.HomeTeam == teamInQuestion ? HomeWinValue : AwayWinValue);

            // get my games played count weighted
            var gamesPlayed = Matches.Where(m => m.HomeTeam == teamInQuestion || m.AwayTeam == teamInQuestion).Sum(s => s.HomeTeam == teamInQuestion ? HomeWinValue : AwayWinValue);

            // get the WP
            return wins / gamesPlayed;
        }

        private void SetWeightingBasedOnSport(Sport Sport)
        {
            switch (Sport)
            {
                case Sport.NCAA_Basketball:
                    homeWinValue = 0.6m;
                    awayWinValue = 1.4m;
                    break;
                case Sport.NCAA_Baseball:
                    homeWinValue = 0.7m;
                    awayWinValue = 1.3m;
                    break;
                default:
                    homeWinValue = defaultHomeWinValue;
                    awayWinValue = defaultAwayWinValue;
                    break;
            }
        }
    }

    public enum Sport
    {
        NoHomeOrAwayWeighting = 1,
        NCAA_Basketball = 2,
        NCAA_Baseball = 3,
    }

    public class Team
    {
        public string Name { get; set; }
        public decimal RPI { get; internal set; }
        public decimal WP { get; internal set; }
        public decimal OWP { get; internal set; }
        public decimal OOWP { get; internal set; }
    }

    public class Match
    {
        public Team HomeTeam { get; set; }
        public Team AwayTeam { get; set; }
        public Team Winner { get; set; }
    }
}

Upvotes: 2

Thomas Ahle
Thomas Ahle

Reputation: 31604

We need to make some assumptions, as can be seen from the following example:

Team Rock beats Team Scissors
Team Paper beats Team Rock
Team Rock beats Team Scissors

Now we have a fight between Team Scissors and Team Paper. Since Team Paper has beaten Team Rock which again has beaten team Scissors twice, we might assume that the best odds are on Paper beating Scissors.

However in the above we have assumed a transitive model, which clearly isn't applicable. It may be a better fit for some sports like football, but still not exactly.

What ELO does is to assume that all teams have some 'inherent strength' on a scale from 0 to infinity. It should be obvious that no skills really work like this, but it turns out to be a useful model for predicting games. Also notice how this model doesn't work well for the Rock, Paper, Scissors game.

The next assumption made in chess is that the absolute difference in 'inherent strength' creates a probability distribution on the likelihood of one player beating another. Again, this is clearly not true, as things like having the white/black pieces also play in. It can however be made more precise (shows the evidence) by looking at the winning chances over multiple games.

With the above two assumptions we can calculate winning chances, and if the model turns out to be a good fit for the game, they may be reasonably precise. Unfortunately this sort of modelling is not an exact science, and no matter what assumptions you make, you can always find a game/situation where they don't apply.

I hope this gave you some inspiration for coming up with more assumptions for your model :) Once you have them, you can test how well it works, by seeing if it can predict some of the results you already have in your training set. There is a whole world of machine learning and statistics literature out there for you to enjoy :)

Upvotes: 2

Erti-Chris Eelmaa
Erti-Chris Eelmaa

Reputation: 26268

This problem can be solved with directed graphs.

Let all the teams be vertices, and directed edge between team1 and team2 means that team1 beat the team2.

After that is done, you can divide the graph into strongly connected components, and work with each connected component independently, as they are statistically independent. Or are they? Wink

The question we have to ask, what is the chance of team1 beating team2?

This is easy to answer, if the teams you're comparing, have direct matches between them. In this case, you care only about direct matches; eg how many times has team1 beaten team2? The way you answer the question;

(team1WinsAgainstTeam2)/(matchesPlayedBetweenThem)

Is this correct? Should the chances decrease for teamA when we know that teamB has played with teamX and WON 100 times, but teamX has always beaten teamA? If yes, discard everything what I've said in this post :-)

The final algorithm should look something like this:

double getOddsTeam1Winning(int team1, int team2){

    if(!isInSameConnectedComponent(team1, team2)){
        // if two teams are not in the same
        // connected component,

        // we can use heuristic,
        // we'll compare how many matches has team1 won,
        // compared to team2.

        var team1Wins = (team1Wins - team1Loses);
        var team2Wins = (team2Wins - team2Loses);

        return team1Wins / (team1Wins + team2Wins);
    }

    if(isDirectMatchBetween(team1, team2)){
        return team1WinsOverTeam2/
            totalMatchesPlayedBetweenTeam1AndTeam2;
    }

    List<double> odds= new List<double>();

    foreach(var opponentTeam in teamsThatTeam1HasPlayedWith){

        var oddsThatOponentTeamBeatsTeam2 = getOddsTeam1Winning(opponentTeam, team2);
        var oddsThatTeam1BeatsOpponentTeam = getOddsTeam1Winning(team1, opponentTeam);

        // combine them &  push them to odds list
    }

    return odds.Average(); // taking average of odds?!
}

ps,I put this together in few minutes, not entirely sure if it's correct mathematically, but I think that will solve the problem in your original problem you've listed, atleast one instance of it :p.

Upvotes: 0

David S&#225;nchez
David S&#225;nchez

Reputation: 606

Why dont you just use the Ratings Percentage Index from wikipedia. You can find it better explain there, but as an fast introduction you use the following formula:

RPI = (WP * 0.25) + (OWP * 0.50) + (OOWP * 0.25)

WP: Winning Percentage wins / games_played

OWP: Calculated by taking the average of the WP's for each of the team's opponents with the requirement that all games against the team in question are removed from the calculation

OOWP: Average of each Opponent's OWP.

This problem was also used by the Google Code Jam.

Hope the algortihm can help you.

All the thanks to Wikipedia.

Upvotes: 2

Julian
Julian

Reputation: 4366

You want to make a best estimate of a probability (or multiple probabilities) and continuously update your estimate as more data become available. That calls for Bayesian inference! Bayesian reasoning is based on the observation that the probability (distribution) of two things, A and B, being the case at the same time is equal to the probability (distribution) of A being the case given that B is the case times the probability that B is the case. In formula form:

P(A,B) = P(A|B)P(B)

and also

P(A,B) = P(B|A)P(A)

and hence

P(A|B)P(B) = P(B|A)P(A)

Take P(B) to the other side and we get the Bayesian update rule:

P(A|B)' = P(B|A)P(A)/P(B)

Usually A stands for whatever variable you are trying to estimate (e.g. "team x beats team y") while B stands for your observations (e.g. the full history of matches won and lost between teams). I wrote the prime (i.e. the quote in P(A|B)') to signify that the left hand of the equation represents an update of your beliefs. To make it concrete, your new estimate of the probability that team x will beat team y, given all observations so far, is the probability of doing those observations given your previous estimate, times your previous estimate, divided by the overall probability of seeing the observations you have seen (i.e. given no assumptions about relative strength between teams; one team winning most of the time is less likely than both teams winning about equally often).

The P(A|B)' from the left hand of the current update becomes the new P(A) on the right hand of the next update. You just keep repeating this as more data come in. Typically, in order to be as unbiased as possible, you start with a completely flat distribution for P(A). Over time P(A) will become more and more certain, although the algorithm is fairly well able to deal with sudden changes of the underlying probability that you're trying to estimate (e.g. if team x suddenly becomes much stronger because a new player joins the team).

The good news is that Bayesian inference works well with the beta distribution which ElKamina also mentioned. In fact the two are often combined in artificial intelligence systems that are meant to learn a probability distribution. While the beta distribution in itself is still an assumption, it has the advantage that it can take many forms (including completely flat and extremely spikey), so there's relatively little reason to be concerned that your choice of distribution might be affecting your outcome.

One piece of bad news is that you still need to make assumptions, apart from the beta distribution. For example, suppose you have the following variables:

A: team x beats team y

B: team y beats team z

C: team x beats team z

and you have observations from direct matches between x and y and from matches between y and z but not from matches between x and z. A simple (though naieve) way to estimate P(C) could be to assume transitivity:

P(C) = P(A)P(B)

Regardless how sophisticated your approach, you'll have to define some kind of structure of probabilities to deal with the gaps as well as the interdependencies in your data. Whatever structure you choose, it will always be an assumption.

Another piece of bad news is that this approach is plain complicated and I cannot give you a full account of how to apply it to your problem. Given that you need a structure of interdependent probabilities (probability of team x beating team y given other distributions involving teams x, y and z), you may want to use a Bayesian network or related analysis (for example a Markov random field or path analysis).

I hope this helps. In any case, feel free to ask for clarifications.

Upvotes: 6

ElKamina
ElKamina

Reputation: 7807

===Step 1===

Let us say two teams A and B played n matches with each other and A won m times. By applying a flat beta distribution, the probability of A winning next time is: (m+1)/(n+2).

As you can see, if m and n are large numbers, it is roughly equal to m/n.

===Step 2===

In your case I suggest the following strategy.

Let m = mp + md + mr and n = np + nd + nr

Suffixes p means prior, d means direct and r means indirect.

You can set mp and np to 1 and 2 respectively (assuming flat prior) or in an advanced fashion (detailed at the end)

md and nd are wins and games.

mr and nr are calculated with some strategy.

In the end probability of A winning is (mp+md+mr)/(np+nd+nr).

===Step 3===

How to calculate mr and nr:

You can use some kind of dampening. Eg. If A def C and C def B, count it as p wins for A against B. For longer chains, use exponential decay.

Optimal value of p can be calculated using cross validation where you leave out certain part of the data and use p that maximizes probability of that left-out data. For your particular problem I suggest leaving out games between a pair, estimated probability and comparing to actual values.

You can use: k*log^2(s/t) as penalty where k is the number of games between left-out pair A and B, s is the predicted and t the actual probability of A winning. You can also use some thing like KL divergence.

===Step 4===

Revisit setting mp and np.

You need to have lots of multiple matchups between same teams for this to work out.

For every pair of teams calculate the probability of winning and plot it. If it looks flat, using 1 and 2 as mp and np is fine. Otherwise go through http://en.wikipedia.org/wiki/Beta_distribution and pick the distribution that best matches.

Upvotes: 0

mcdowella
mcdowella

Reputation: 19601

You could try applying the http://en.wikipedia.org/wiki/Elo_rating_system, mostly on the basis that other people have used this for working out strengths. However, any such system is really based on some sort of probability model of what is really going on, and you would be better trying to come up with one for your particular game. For instance, for soccer, one approach is to model the number of goals scored by a team as a Poisson process which depends on the strength of their offense and the other side's defense. Once you have a model you can fit it to the data, for instance by maximum likelihood.

As an example of the model making a difference, look at http://en.wikipedia.org/wiki/Nontransitive_dice. This is a simple example of a situation in which A usually beats B, B usually beats C, and C usually beats A, which is not what you expect given a simple one-dimensional strength system.

Upvotes: 0

Related Questions