Reputation:
When I used sklearn function roc_curve on my data with logistic regression model:
roc_curve(y_test, predictions_test)
I got this result:
(array([0. , 0.1, 1. ]), array([0. , 0.865, 1. ]), array([2, 1, 0]))
In [137]:
I know that in third array there are thresholds and in first and second there are corresponding TPR and FPR. But I dint understand why there are three thresholds. How number of thresholds is defined in this function? For example when I use logistic regression, thresholds must be probabilities from sigmoid function, but here they are 2,1,0. Why so?
Upvotes: 2
Views: 1297
Reputation: 3036
As you might see from the source code (within the call to _binary_clf_curve()
, in turn called by roc_curve()
here) the number of thresholds is actually defined by the number of distinct predictions_test
(scores, in principle). From your output, however, I would suppose predictions_test
might be the output of .predict()
(perhaps of a multiclass classification problem? - in which case by the way you'll need to extend the ROC curve definition to deal with multiclass setting) rather than of .predict_proba()
or .decision_function()
as roc_curve
requires.
Moreover, be aware that roc_curve
also has a parameter drop_intermediate
(default to True) which, in some cases, might drop suboptimal thresholds.
Eventually, I'd suggest the following posts:
drop_intermediate=True
.Upvotes: 1