Jahar tyagi
Jahar tyagi

Reputation: 101

np.percentile does not seem to be giving the correct output

I have a below list.

33, 26, 24, 21, 19, 20, 18, 18, 52, 56, 27, 22, 18, 49, 22, 20, 23, 32, 20, 18

All I am trying is to find the the 25th Percentile.

I used simple numpy program to find it.

import numpy as np
arr = [33, 26, 24, 21, 19, 20, 18, 18, 52, 56, 27, 22, 18, 49, 22, 20, 23, 32, 20, 18]
np.percentile(arr,25)

Output is : 19.75

But If we count is manually or Use Excel the 25th percentile comes as 19.25.

enter image description here

I expect the output as 19.25 but the actual output from numpy is 19.75. Can someone please help what is wrong here?

Upvotes: 1

Views: 2852

Answers (2)

Arkady. A
Arkady. A

Reputation: 545

You see, in excel there's two percentile function: PERCENTILE.EXC and PERCENTILE.INC and the difference is in "the Percentile.Inc function the value of k is is within the range 0 to 1 inclusive, and in the Percentile.Exc function, the value of k is within the range 0 to 1 exclusive." (source)

Numpy's percentile function computes the k'th percentile where k must be between 0 and 100 inclusive (docs)

Let's check that.

Difference beetwen INC and EXC excel's PERCENTILE functions

arr = [18, 18, 18, 18, 19, 20, 20, 20, 21, 22, 22, 23, 24, 26, 27, 32, 33, 49, 52, 56]
np.percentile(arr,25)

19.75

Hope that helps

Upvotes: 2

NaN
NaN

Reputation: 2332

Check your input values, and lookup what excel uses, since these are the options in numpy

t = ['linear', 'lower', 'higher', 'nearest', 'midpoint']    
arr = np.array([33, 26, 24, 21, 19, 20, 18, 18, 52, 56, 27, 22, 18, 49, 22, 20, 23, 32, 20, 18])
    for cnt, i in enumerate(t):
        v = np.percentile(arr, 25., interpolation=i)
        print("type: {} value: {}".format(i, v))

    type: linear value: 19.75
    type: lower value: 19
    type: higher value: 20
    type: nearest value: 20
    type: midpoint value: 19.5

Upvotes: 1

Related Questions