add-semi-colons
add-semi-colons

Reputation: 18840

python: numpy - calculate percentile with linear interpolation

I am trying to calculate percentile after reading the wikipedia I implemented the simple formula

def _percentile(numList, percentile):
    numList.sort()
    n = int(round(percentile * len(numList) + 0.5))
    if n > 1:
        return numList[n-2]
    else:
        return 0

But what I want to do is the interpolation version mentioned in the wiki: (http://en.wikipedia.org/wiki/Percentile#Linear_interpolation_between_closest_ranks) I search in google and found numpy but I don't think I am getting the correct value when I use it even for the simple formula. And when I try to pass in the value to do the interpolation it gives me error. (http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.percentile.html)

lets starts with the following list:

B = [15, 20, 35, 40, 50]

according to my method: I get the actual value of the original list that represent the percentile that I am looking for:

>>> print percentile(B, P=0.)
0
>>> print percentile(B, P=0.1)
0
>>> print percentile(B, P=0.2)
15
>>> print percentile(B, P=0.3)
15
>>> print percentile(B, P=0.4)
20
>>> print percentile(B, P=0.5)
20
>>> print percentile(B, P=0.6)
35
>>> print percentile(B, P=0.7)
35
>>> print percentile(B, P=0.8)
40
>>> print percentile(B, P=0.9)
40
>>> print percentile(B, P=0.95)
40
>>> print percentile(B, P=1.0)
50

But if I use the numpy I don't get the actual value that represent the original list.

>>> np.percentile(B, 0.1)
15.02
>>> np.percentile(B, 0.2)
15.039999999999999
>>> np.percentile(B, 0.3)
15.06
>>> np.percentile(B, 0.4)
15.08
>>> np.percentile(B, 0.5)
15.1
>>> np.percentile(B, 0.6)
15.120000000000001
>>> np.percentile(B, 0.7)
15.140000000000001
>>> np.percentile(B, 0.8)
15.16
>>> np.percentile(B, 0.9)
15.18
>>> np.percentile(B, 1)
15.199999999999999
>>> np.percentile(B, 10)
17.0
>>> np.percentile(B, 20)
19.0
>>> np.percentile(B, 30)
23.0
>>> np.percentile(B, 40)
29.0
>>> np.percentile(B, 50)
35.0

My question is given an array how can I get the value from that array that represent percentiles such as 10, 20...100 by using linear interpolation technique to calculate percentile?

Upvotes: 2

Views: 2682

Answers (2)

Alveoli
Alveoli

Reputation: 1212

I had the same problem. For me, it was simple... I thought that the percentile parameter (you call it P) is a float from 0.0-1.0 where 1.0 represents 100%-percentile.

I just read the manual and found P is in a range 0-100, where 100 represents 100%-percentile.

numpy.percentile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear')

q : float in range of [0,100] (or sequence of floats) Percentile to compute which must be between 0 and 100 inclusive.

http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.percentile.html

Hope that helps!

Upvotes: 1

space
space

Reputation: 417

numpy is doing the right thing.

Your code is returning the percentile of the numList + [0], i.e., a set that includes 0.

The 0th percentile item would be the lowest item in numList, which in the example is 15.

Upvotes: 0

Related Questions