Alex Rothberg
Alex Rothberg

Reputation: 10993

Calculate Arbitrary Percentile on Pandas GroupBy

Currently there is a median method on the Pandas's GroupBy objects.

Is there is a way to calculate an arbitrary percentile (see: http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.percentile.html) on the groupings?

Median would be the calcuation of percentile with q=50.

Upvotes: 37

Views: 72115

Answers (4)

Mr. Hobo
Mr. Hobo

Reputation: 552

Based on my original answer using pypi/pandas-wizard. Now you can simply:

import pandaswizard as pdw # attempt to create an ubiquitous naming
column.agg([np.sum, np.mean, pdw.percentile(50), pdw.quantile(0.95)])

Note, that the module mimics both quantile and percentile using the internal function pd.Series.quantile(), and attributes like interpolation (or method name as in numpy) are allowed. Creating a wrapper to allow calculating based on any defined numpy specific methods.

Upvotes: 0

sushmit
sushmit

Reputation: 4603

With pandas >= 0.25.0 you can also use Named aggregation

An example would be

import numpy
import pandas as pd
df = pd.DataFrame({'A': numpy.random.randint(1,3,size=100),'C': numpy.random.randn(100)})
df.groupby('A').agg(min_val = ('C','min'), percentile_80 = ('C',lambda x: x.quantile(0.8)))

Upvotes: 11

leocrimson
leocrimson

Reputation: 742

I found another useful solution here

If I have to use groupby another approach can be:

def percentile(n):
    def percentile_(x):
        return np.percentile(x, n)
    percentile_.__name__ = 'percentile_%s' % n
    return percentile_

Using the below call, I am able to achieve the same result as the solution given by @TomAugspurger

df.groupby('C').agg([percentile(50), percentile(95)])

Upvotes: 21

TomAugspurger
TomAugspurger

Reputation: 28936

You want the quantile method:

In [47]: df
Out[47]: 
           A         B    C
0   0.719391  0.091693  one
1   0.951499  0.837160  one
2   0.975212  0.224855  one
3   0.807620  0.031284  one
4   0.633190  0.342889  one
5   0.075102  0.899291  one
6   0.502843  0.773424  one
7   0.032285  0.242476  one
8   0.794938  0.607745  one
9   0.620387  0.574222  one
10  0.446639  0.549749  two
11  0.664324  0.134041  two
12  0.622217  0.505057  two
13  0.670338  0.990870  two
14  0.281431  0.016245  two
15  0.675756  0.185967  two
16  0.145147  0.045686  two
17  0.404413  0.191482  two
18  0.949130  0.943509  two
19  0.164642  0.157013  two

In [48]: df.groupby('C').quantile(.95)
Out[48]: 
            A         B
C                      
one  0.964541  0.871332
two  0.826112  0.969558

Upvotes: 59

Related Questions