Reputation: 75575
I have looked this answer which explains how to compute the value of a specific percentile, and this answer which explains how to compute the percentiles that correspond to each element.
Using the first solution, I can compute the value and scan the original array to find the index.
Using the second solution, I can scan the entire output array for the percentile I'm looking for.
However, both require an additional scan if I want to know the index (in the original array) that corresponds to a particular percentile (or the index containing the element closest to that index).
Is there is more direct or built-in way to get the index which corresponds to a percentile?
Note: My array is not sorted and I want the index in the original, unsorted array.
Upvotes: 21
Views: 20257
Reputation: 8087
If numpy is to be used, one can use the built-in percentile function, but the way you do this depends on the version you have (very old <v1.9.0, old < 1.22 or new >=1.22)
From v1.22.0 of numpy
you can write
np.percentile(x,p,method="method")
with method chosen from:
‘inverted_cdf’
‘averaged_inverted_cdf’
‘closest_observation’
‘interpolated_inverted_cdf’
‘hazen’
‘weibull’
‘linear’ (default)
‘median_unbiased’
‘normal_unbiased’
For older versions before v1.22
NOTE: The original answer below is depreciated from numpy v1.22.0 onwards - the argument interpolation
is now depreciated and is renamed method
- the lower
, higher
and nearest
methods are retained for backwards compatibility but are now in method linear
. New methods have now been added, see the man page for details.
From version 1.9.0 of numpy, percentile has the option "interpolation" that allows you to pick out the lower/higher/nearest percentile value. The following will work on unsorted arrays and finds the nearest percentile index:
import numpy as np
p=70 # my desired percentile, here 70%
x=np.random.uniform(10,size=(1000))-5.0 # dummy vector
# index of array entry nearest to percentile value
pcen=np.percentile(x,p,interpolation='nearest')
i_near=abs(x-pcen).argmin()
Most people will normally want the nearest percentile value as stated above. But just for completeness, you can also easily specify to get the entry below or above the stated percentile value:
# Use this to get index of array entry greater than percentile value:
pcen=np.percentile(x,p,interpolation='higher')
# Use this to get index of array entry smaller than percentile value:
pcen=np.percentile(x,p,interpolation='lower')
For EXTREMELY OLD versions of numpy < v1.9.0, the interpolation option is not available, and thus the equivalent is this:
# Calculate 70th percentile:
pcen=np.percentile(x,p)
i_high=np.asarray([i-pcen if i-pcen>=0 else x.max()-pcen for i in x]).argmin()
i_low=np.asarray([i-pcen if i-pcen<=0 else x.min()-pcen for i in x]).argmax()
i_near=abs(x-pcen).argmin()
In summary:
i_high points to the array entry which is the next value equal to, or greater than, the requested percentile.
i_low points to the array entry which is the next value equal to, or smaller than, the requested percentile.
i_near points to the array entry that is closest to the percentile, and can be larger or smaller.
My results are:
pcen
2.3436832738049946
x[i_high]
2.3523077864975441
x[i_low]
2.339987054079617
x[i_near]
2.339987054079617
i_high,i_low,i_near
(876, 368, 368)
i.e. location 876 is the closest value exceeding pcen, but location 368 is even closer, but slightly smaller than the percentile value.
Upvotes: 6
Reputation: 10995
You can use numpy's np.percentile
as such:
import numpy as np
percentile = 75
mylist = [random.random() for i in range(100)] # random list
percidx = mylist.index(np.percentile(mylist, percentile, interpolation='nearest'))
Upvotes: 3
Reputation: 57
Using numpy,
arr = [12, 19, 11, 28, 10]
p = 0.75
np.argsort(arr)[int((len(arr) - 1) * p)]
This returns 11, as desired.
Upvotes: 3
Reputation: 63
You can select the values in a df in a designated quantile with df.quantile().
df_metric_95th_percentile = df.metric[df >= df['metric'].quantile(q=0.95)]
Upvotes: 1
Reputation: 67427
It is a little convoluted, but you can get what you are after with np.argpartition
. Lets take an easy array and shuffle it:
>>> a = np.arange(10)
>>> np.random.shuffle(a)
>>> a
array([5, 6, 4, 9, 2, 1, 3, 0, 7, 8])
If you want to find e.g. the index of quantile 0.25, this would correspond to the item in position idx
of the sorted array:
>>> idx = 0.25 * (len(a) - 1)
>>> idx
2.25
You need to figure out how to round that to an int, say you go with nearest integer:
>>> idx = int(idx + 0.5)
>>> idx
2
If you now call np.argpartition
, this is what you get:
>>> np.argpartition(a, idx)
array([7, 5, 4, 3, 2, 1, 6, 0, 8, 9], dtype=int64)
>>> np.argpartition(a, idx)[idx]
4
>>> a[np.argpartition(a, idx)[idx]]
2
It is easy to check that these last two expressions are, respectively, the index and the value of the .25 quantile.
Upvotes: 12
Reputation: 6994
Assuming the array is sorted... Unless I'm misunderstanding you, you can compute the index of a percentile by taking the length of the array -1, multiplying it by the quantile, and rounding to the nearest integer.
round( (len(array) - 1) * (percentile / 100.) )
should give you the nearest index to that percentile
Upvotes: 1