Reputation: 18840
I am trying to calculate percentile after reading the wikipedia I implemented the simple formula
def _percentile(numList, percentile):
numList.sort()
n = int(round(percentile * len(numList) + 0.5))
if n > 1:
return numList[n-2]
else:
return 0
But what I want to do is the interpolation version mentioned in the wiki: (http://en.wikipedia.org/wiki/Percentile#Linear_interpolation_between_closest_ranks) I search in google and found numpy but I don't think I am getting the correct value when I use it even for the simple formula. And when I try to pass in the value to do the interpolation it gives me error. (http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.percentile.html)
lets starts with the following list:
B = [15, 20, 35, 40, 50]
according to my method: I get the actual value of the original list that represent the percentile that I am looking for:
>>> print percentile(B, P=0.)
0
>>> print percentile(B, P=0.1)
0
>>> print percentile(B, P=0.2)
15
>>> print percentile(B, P=0.3)
15
>>> print percentile(B, P=0.4)
20
>>> print percentile(B, P=0.5)
20
>>> print percentile(B, P=0.6)
35
>>> print percentile(B, P=0.7)
35
>>> print percentile(B, P=0.8)
40
>>> print percentile(B, P=0.9)
40
>>> print percentile(B, P=0.95)
40
>>> print percentile(B, P=1.0)
50
But if I use the numpy I don't get the actual value that represent the original list.
>>> np.percentile(B, 0.1)
15.02
>>> np.percentile(B, 0.2)
15.039999999999999
>>> np.percentile(B, 0.3)
15.06
>>> np.percentile(B, 0.4)
15.08
>>> np.percentile(B, 0.5)
15.1
>>> np.percentile(B, 0.6)
15.120000000000001
>>> np.percentile(B, 0.7)
15.140000000000001
>>> np.percentile(B, 0.8)
15.16
>>> np.percentile(B, 0.9)
15.18
>>> np.percentile(B, 1)
15.199999999999999
>>> np.percentile(B, 10)
17.0
>>> np.percentile(B, 20)
19.0
>>> np.percentile(B, 30)
23.0
>>> np.percentile(B, 40)
29.0
>>> np.percentile(B, 50)
35.0
My question is given an array how can I get the value from that array that represent percentiles such as 10, 20...100 by using linear interpolation technique to calculate percentile?
Upvotes: 2
Views: 2682
Reputation: 1212
I had the same problem. For me, it was simple... I thought that the percentile parameter (you call it P) is a float from 0.0-1.0 where 1.0 represents 100%-percentile.
I just read the manual and found P is in a range 0-100, where 100 represents 100%-percentile.
numpy.percentile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear')
q : float in range of [0,100] (or sequence of floats) Percentile to compute which must be between 0 and 100 inclusive.
http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.percentile.html
Hope that helps!
Upvotes: 1
Reputation: 417
numpy is doing the right thing.
Your code is returning the percentile of the numList + [0]
, i.e., a set that includes 0.
The 0th percentile item would be the lowest item in numList
, which in the example is 15.
Upvotes: 0