Thresholds, False Positive Rate, True Positive Rate

Question

I am trying to get a clear understanding of what goes into the calculations of the terms in the title. The documentation at https://scikit-learn.org/stable/modules/model_evaluation.html#roc-metrics says

"A receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied. It is created by plotting the fraction of true positives out of the positives (TPR = true positive rate) vs. the fraction of false positives out of the negatives (FPR = false positive rate), at various threshold settings."

Here is some simple code I created from some predictions I did using keras.

import numpy as np
from sklearn import metrics
test1 = '0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0'        
pred1 = '0.04172871 0.01611879 0.01073375 0.03344169 0.04172871 0.04172871\
 0.00430162 0.04172871 0.04172871 0.04172871 0.07977659 0.905772\
 0.9396076  0.03344169 0.04172871 0.09125287 0.02964183 0.0641269\
 0.04172871 0.04172871 0.04172871 0.0641269  0.04172871 0.04172871\
 0.9919831  0.04172871 0.01611879 0.04172871 0.37865442 0.00240888'

test = np.array([int(i) for i in test1.split()])
pred =np.array([float(i) for i in pred1.split()])

print(type(test))

print(type(pred))

fpr, tpr, thresholds = metrics.roc_curve(test, pred)
print('false pos rate')
print(fpr)
print('true pos rate')
print(tpr)
print('thresholds')
print(thresholds)

I can see how it has selected the thresholds (in this case 10 values - lowest pred, highest pred+1) but why in this case 10 thresholds values, - why not some other number? I'd also like to be able to follow the algebra in how it gets the fpr and tpr values using the thresholds. The answer is probably in the documentation sentence I gave above but I have not got my head around in how the rate calculation works.

Here, respectively are the thresholds, the fp rates, and the tp rates

[1.9919831  0.9919831  0.37865442 0.07977659 0.0641269  0.04172871
 0.03344169 0.02964183 0.01611879 0.00240888]
[0.         0.         0.         0.07692308 0.15384615 0.69230769
 0.76923077 0.80769231 0.88461538 1.        ]
[0.   0.25 1.   1.   1.   1.   1.   1.   1.   1.  ]

Thresholds, False Positive Rate, True Positive Rate

Answers (1)

Related Questions