Steven Billard
Steven Billard

Reputation: 53

Scikit Learn Algorithm has incorrect predictions but ROC curve is perfect?

Its my first time using scikit learn metrics and I want to graph a roc curve using this library.

This ROC curve says the AUC=1.00 which I know is incorrect. Here is the code:

from sklearn.metrics import roc_curve, auc
import pylab as pl

def show_roc(test_target, predicted_probs):

# set number 1

actual = [1, -1, -1, -1, -1, 1, -1, -1, 1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1]
prediction_probas = [0.374,  0.145,  0.263,  0.129,  0.215,  0.538, 0.24, 0.183, 0.402, 0.2, 0.281,
                0.277, 0.222, 0.204, 0.193, 0.171, 0.401, 0.204, 0.213, 0.182]

fpr, tpr, thresholds = roc_curve(actual, prediction_probas)
roc_auc = auc(fpr, tpr)

# Plot ROC curve
pl.clf()
pl.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc)
pl.plot([0, 1], [0, 1], 'k--')
pl.xlim([-0.1, 1.2])
pl.ylim([-0.1, 1.2])
pl.xlabel('False Positive Rate')
pl.ylabel('True Positive Rate')
pl.title('Receiver operating characteristic example')
pl.legend(loc="lower right")
pl.show()

for this first set, here is the graph: https://i.sstatic.net/pa93c.png

The probabilities are very low, especially for the positives, I don't know why it displays a perfect ROC graph for these inputs.

# set number 2

actual = [1,1,1,0,0,0]
prediction_probas = [0.9,0.9,0.1,0.1,0.1,0.1]

fpr, tpr, thresholds = roc_curve(actual, prediction_probas)
roc_auc = auc(fpr, tpr)

# Plot ROC curve
pl.clf()
pl.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc)
pl.plot([0, 1], [0, 1], 'k--')
pl.xlim([-0.1, 1.2])
pl.ylim([-0.1, 1.2])
pl.xlabel('False Positive Rate')
pl.ylabel('True Positive Rate')
pl.title('Receiver operating characteristic example')
pl.legend(loc="lower right")
pl.show()

for the second set here is the graph output:

This one seems more reasonable, and I included it for comparison.

I have read through the scikit learn documentation pretty much all day and I am stumped.

Upvotes: 4

Views: 815

Answers (1)

Tommy
Tommy

Reputation: 620

You are getting a perfect curve because your labels aka actual line up with your prediction scores aka prediction_probas. Even though the TP scores are low, there is still a distinguishable boundary between the the 1s and -1s which translates into them being in acceptable thresholds for their classifications.

Try changing one of the higher scored 1s to a -1, or any of the -1s to a 1 and see the resulting curve

Upvotes: 1

Related Questions