profesor_tortuga
profesor_tortuga

Reputation: 1916

How to measure the accuracy of predictions using Python/Pandas?

I have used the Elo and Glicko rating systems along with the results for matches to generate ratings for players. Prior to each match, I can generate an expectation (a float between 0 and 1) for each player based on their respective ratings. I would like test how accurate this expectation is, for two reasons:

There are a few differences from chess worth being aware of:

Thinking the appropriate function is "correlation", I have attempted creating a DataFrame containing the prediction in one column (a float between 0, 1) and the result in the other (1|0.5|0) and using corr(), but based on the output, I am not sure if this is correct.

If I create a DataFrame containing expectations and results for only the first player in a match (the results will always be 1.0 or 0.5 since due to my data source, losers are never displayed first), corr() returns very low: < 0.05. However, if I create a series which has two rows for each match and contains both the expectation and result for each player (or, alternatively, randomly choose which player to append, so results will be either 0, 0.5, or 1), the corr() is much higher: ~0.15 to 0.30. I don't understand why this would make a difference, which makes me wonder if I am either misusing the function or using the wrong function entirely.

If it helps, here is some real (not random) sample data: http://pastebin.com/eUzAdNij

Upvotes: 8

Views: 6464

Answers (2)

Gena Kukartsev
Gena Kukartsev

Reputation: 1705

An industry standard way to judge the accuracy of prediction is Receiver Operating Characteristic (ROC). You can create it from your data using sklearn and matplotlib with this code below.

ROC is a 2-D plot of true positive vs false positive rates. You want the line to be above diagonal, the higher the better. Area Under Curve (AUC) is a standard measure of accuracy: the larger the more accurate your classifier is.

import pandas as pd

# read data
df = pd.read_csv('sample_data.csv', header=None, names=['classifier','category'])

# remove values that are not 0 or 1 (two of those)
df = df.loc[(df.category==1.0) | (df.category==0.0),:]

# examine data frame
df.head()

from matplotlib import pyplot as plt
# add this magic if you're in a notebook
# %matplotlib inline

from sklearn.metrics import roc_curve, auc
# matplot figure
figure, ax1 = plt.subplots(figsize=(8,8))

# create ROC itself
fpr,tpr,_ = roc_curve(df.category,df.classifier)

# compute AUC
roc_auc = auc(fpr,tpr)

# plotting bells and whistles
ax1.plot(fpr,tpr, label='%s (area = %0.2f)' % ('Classifier',roc_auc))
ax1.plot([0, 1], [0, 1], 'k--')
ax1.set_xlim([0.0, 1.0])
ax1.set_ylim([0.0, 1.0])
ax1.set_xlabel('False Positive Rate', fontsize=18)
ax1.set_ylabel('True Positive Rate', fontsize=18)
ax1.set_title("Receiver Operating Characteristic", fontsize=18)
plt.tick_params(axis='both', labelsize=18)
ax1.legend(loc="lower right", fontsize=14)
plt.grid(True)
figure.show()

From your data, you should get a plot like this one: enter image description here

Upvotes: 5

ead
ead

Reputation: 34337

Actually, what you observe makes perfectly sense. If there were no draws and you would always show the expectation of the winner in the first row, then there would be no correlation with the second row at all! Because no matter how big or small the expectation, the number in the second row is always 1.0, i.e. it does not depend on the number in the first row at all.

Due to a low percentage of draws (draws probably correlate with the values around 0.5) you still can observe a small correlation.

Maybe the correlation is not the best measure for the accuracy of the predictions here.

One of the problems is, that the Elo does not predict the single result but the expected amount of points. There is at least one unknown factor: The probability of the draw. You have to put additional knowledge about the probability of the draw into your models. This probability is dependent on the strength difference between the players: the bigger the difference the smaller the chance of a draw. One could try the following approaches:

  1. mapping expected points onto expected results, e.g. 0...0.4 means a loss, 0.4..0.6 - a draw and 0.6...1.0 - a win and see how many results are predicted correctly.
  2. For a player and a bunch of games, the measure for accuracy would be |predicted_score-score|/number_of_games averaged over the players. The smaller the difference, the better.
  3. A kind of Bayesian approach: if for a game the predicted amount of points is x than the score of the predictor is x if the game were won and 1-x if the game were lost (maybe you have to skip the draws or score them as (1-x)*x/4 - thus the prediction of 0.5 would have the score of 1). The overall score of the predictor over all games would be the product of the single game scores. The bigger the score, the better.

Upvotes: 2

Related Questions