Reputation: 6333
I see a lot of questions like this one for R, but I couldn't find one specifically for Python, preferably using numpy.
Let's say I have an array of observations stored in x
. I can get the value that accumulates q * 100
per cent of the population.
# Import numpy
import numpy as np
# Get 75th percentile
np.quantile(a=x, q=0.75)
However, I was wondering if there's a function that does the inverse. That is, a numpy function that takes a value as an input and returns q
.
To further expand on this, scipy distribution objects have a ppf
method that allows me to do this. I'm looking for something similar in numpy. Does it exist?
Upvotes: 15
Views: 16428
Reputation: 85
While vals = x.argsort().argsort()/(x.size-1)
works in arrays with fully unique values, it fails if you have repeated values. Identical values should have the same quantile value, but for example, if the array x
had 200 values of zeros and 800 values larger than zero, then this method would give 200 different quantile values to those zero values. Safer to use
vals = np.array([np.count_nonzero(x<x_i)/(x.size-1) for x_i in x])
,
since identical values get identical quantile positions then.
import numpy as np
def get_quant(x):
" for each value in x, return which quantile it corresponds to "
return np.array([np.count_nonzero(x<x_i)/(len(x)-1) for x_i in x])
Note: the (x.size-1)
denominators ensure the quantile values range from 0 to 1 inclusive. Leaving out the -1
means the 100% quantile is never reached.
Upvotes: 0
Reputation: 968
Not a ready-made function but a compact and reasonably fast snippet:
(a<value).mean()
You can (at least on my machine) squeeze out a few percent better performance by using np.count_nonzero
np.count_nonzero(a<value) / a.size
but tbh I wouldn't even bother.
Upvotes: 22
Reputation: 4510
There's a convenience function that does this. Note that it's not an exact inverse because the quantile
/percentile
functions are not exact. Given a finite array of observations, the percentiles will have discrete values; in other words, you may be specifying a q
that falls between those values and the functions find the closest one.
from scipy import stats
import numpy as np
stats.percentileofscore(np.arange(0,1,0.12), .65, 'weak') / 100
Upvotes: 11
Reputation: 114330
If x
is sorted, the value at index i
is the i / len(x)
percentile (or so, depending on how you want to treat boundary conditions). If x
is not sorted, you can obtain the same value by substituting x.argsort().argsort()[i]
for i
(or just sorting x
first). Since argsort
is its own inverse, the double argsort tells you where each element of the original would fall in the sorted array.
If you want to find the result for arbitrary values not necessarily in x
, you can apply np.searchsorted
to a sorted version of x
and interpolating on the result. You can use a more complicated method, like fitting a spline to the sorted data or something similar.
Upvotes: 2