Reputation: 485
Following is the snippet from sci-kit pr-curve computation.
>>> import numpy as np
>>> from sklearn.metrics import precision_recall_curve
>>> y_true = np.array([0, 0, 1, 1])
>>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
>>> precision, recall, thresholds = precision_recall_curve(
... y_true, y_scores)
>>> precision
array([ 0.66..., 0.5 , 1. , 1. ])
>>> recall
array([ 1. , 0.5, 0.5, 0. ])
>>> thresholds
array([ 0.35, 0.4 , 0.8 ])
Doubts:
Why are thresholds only 3 while precision and recall given are 4. As one can clearly see the threshold of 0.1 is left out. And the computation starts from threshold 0.35 and more.
Upvotes: 1
Views: 1253
Reputation: 1629
The thresholds only go low enough to attain 100% recall. The idea being that you generally wouldn't set a lower threshold as it would introduce unnecessary false positives.
https://github.com/scikit-learn/scikit-learn/blob/a24c8b46/sklearn/metrics/ranking.py
# stop when full recall attained
# and reverse the outputs so recall is decreasing
last_ind = tps.searchsorted(tps[-1])
sl = slice(last_ind, None, -1)
return np.r_[precision[sl], 1], np.r_[recall[sl], 0], thresholds[sl]
Upvotes: 2