How to measure the accuracy of predictions using Python/Pandas?

Question

I have used the Elo and Glicko rating systems along with the results for matches to generate ratings for players. Prior to each match, I can generate an expectation (a float between 0 and 1) for each player based on their respective ratings. I would like test how accurate this expectation is, for two reasons:

To compare the difference rating systems
To tune variables (such as kfactor in Elo) used to calculate ratings

There are a few differences from chess worth being aware of:

Possible results are wins (which I am treating as 1.0), losses (0.0), with the very occasional (<5%) draws (0.5 each). Each individual match is rated, not a series like in chess.
Players have less matches -- many have less than 10, few go over 25, max is 75

Thinking the appropriate function is "correlation", I have attempted creating a DataFrame containing the prediction in one column (a float between 0, 1) and the result in the other (1|0.5|0) and using corr(), but based on the output, I am not sure if this is correct.

If I create a DataFrame containing expectations and results for only the first player in a match (the results will always be 1.0 or 0.5 since due to my data source, losers are never displayed first), corr() returns very low: < 0.05. However, if I create a series which has two rows for each match and contains both the expectation and result for each player (or, alternatively, randomly choose which player to append, so results will be either 0, 0.5, or 1), the corr() is much higher: ~0.15 to 0.30. I don't understand why this would make a difference, which makes me wonder if I am either misusing the function or using the wrong function entirely.

If it helps, here is some real (not random) sample data: http://pastebin.com/eUzAdNij

ead · Accepted Answer

Actually, what you observe makes perfectly sense. If there were no draws and you would always show the expectation of the winner in the first row, then there would be no correlation with the second row at all! Because no matter how big or small the expectation, the number in the second row is always 1.0, i.e. it does not depend on the number in the first row at all.

Due to a low percentage of draws (draws probably correlate with the values around 0.5) you still can observe a small correlation.

Maybe the correlation is not the best measure for the accuracy of the predictions here.

One of the problems is, that the Elo does not predict the single result but the expected amount of points. There is at least one unknown factor: The probability of the draw. You have to put additional knowledge about the probability of the draw into your models. This probability is dependent on the strength difference between the players: the bigger the difference the smaller the chance of a draw. One could try the following approaches:

mapping expected points onto expected results, e.g. 0...0.4 means a loss, 0.4..0.6 - a draw and 0.6...1.0 - a win and see how many results are predicted correctly.
For a player and a bunch of games, the measure for accuracy would be |predicted_score-score|/number_of_games averaged over the players. The smaller the difference, the better.
A kind of Bayesian approach: if for a game the predicted amount of points is x than the score of the predictor is x if the game were won and 1-x if the game were lost (maybe you have to skip the draws or score them as (1-x)*x/4 - thus the prediction of 0.5 would have the score of 1). The overall score of the predictor over all games would be the product of the single game scores. The bigger the score, the better.

How to measure the accuracy of predictions using Python/Pandas?

Answers (2)

Related Questions