Reputation: 3337
I understand percentile in the context of test scores with many examples (eg. you SAT score falls in the 99th percentile), but I am not sure I understand percentile in the following context and what is going on. Imagine a model outputs probabilities (on some days we have a lot of new data and outputted probabilities, and some days we don't). Imagine I want to compute the 99th percentile of outputted probabilities. Here are the probabilities for today:
a = np.array([0,0.2,0.4,0.7,1])
p = np.percentile(a,99)
print(p)
0.988
I don't understand how the 99th percentile is computed in this situation where there are only 5 outputted probabilities. How was the output computed? Thanks!
Upvotes: 2
Views: 698
Reputation: 164693
Linear interpolation is applied. You can check consistency yourself:
a = np.array([0,0.2,0.4,0.7,1])
np.sort(a) # array([ 0. , 0.2, 0.4, 0.7, 1. ])
np.percentile(a, 75) # 0.70
np.percentile(a, 100) # 1.0
np.percentile(a, 99) # 0.988
0.70 + (1.0 - 0.70) * (99 - 75) / (100 - 75) # 0.988
The documentation also specifies 'linear'
as the default:
numpy.percentile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False)
'linear':
i + (j - i) * fraction
, wherefraction
is the fractional part of the index surrounded byi
andj
.
Upvotes: 4