Natig Aliyev
Natig Aliyev

Reputation: 389

Which method does pandas use for percentile?

I was trying to understand lower/upper percentiles calculation in pandas and got a bit confused. Here is the sample code and output for it.

test = pd.Series([7, 15, 36, 39, 40, 41])
test.describe()

output:

enter image description here

I am interested in only 25%, 75% percentiles. I wonder which method does pandas use to calculate them?

Referring to https://en.wikipedia.org/wiki/Quartile the article, results are different as following:

enter image description here

So what statistical/mathematical method does pandas uses to calculate percentile?

Upvotes: 3

Views: 4388

Answers (2)

Natig Aliyev
Natig Aliyev

Reputation: 389

As I mentioned in the comments, I finally figured out how it works by trying from pandas.core.algorithms import quantile using quantile function as @Abdou suggested.

I am not that good to explain it only by typing, therefore I will do it only on the given example for 25% and 75% for this example only. Here is the brief (maybe poor) explanation:

For the example list [7, 15, 36, 39, 40, 41] quantiles are following way:

7 -> 0%

15 -> 20%

36 -> 40%

39 -> 60%

40 -> 80%

41 -> 100%

Since we want to find 25% percentile, it will be between 15 and 36, moreover, it is 20% + 5% = 15 + (36-15)/4 = 15 + 5.25 = 20.25.

(36-15)/4 is used, because the distance between 15 and 36 is 40% - 20% = 20%, so we divide it by 4 to get 5%.

The same way we can find 75%.

60% + 15% = 39 + 3*(40-39)/4 = 39.75

That's it. I am really sorry for poor explanation

NOTE: Thank you @shin for the correction mentioned in the comment.

Upvotes: 6

Alex
Alex

Reputation: 826

It does a [series.quantile(x) for x in percentiles] where percentiles is percentiles = np.array([0.25, 0.5, 0.75]) if it s not provided.

You can see that in pandas/pandas/core/generic.py

So it is using : http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.quantile.html

Upvotes: 1

Related Questions