Reputation: 10993
Currently there is a median
method on the Pandas's GroupBy
objects.
Is there is a way to calculate an arbitrary percentile
(see: http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.percentile.html) on the groupings?
Median would be the calcuation of percentile with q=50
.
Upvotes: 37
Views: 72115
Reputation: 552
Based on my original answer using pypi/pandas-wizard. Now you can simply:
import pandaswizard as pdw # attempt to create an ubiquitous naming
column.agg([np.sum, np.mean, pdw.percentile(50), pdw.quantile(0.95)])
Note, that the module mimics both quantile
and percentile
using the internal function pd.Series.quantile()
, and attributes like interpolation
(or method
name as in numpy
) are allowed. Creating a wrapper to allow calculating based on any defined numpy
specific methods.
Upvotes: 0
Reputation: 4603
With pandas >= 0.25.0
you can also use Named aggregation
An example would be
import numpy
import pandas as pd
df = pd.DataFrame({'A': numpy.random.randint(1,3,size=100),'C': numpy.random.randn(100)})
df.groupby('A').agg(min_val = ('C','min'), percentile_80 = ('C',lambda x: x.quantile(0.8)))
Upvotes: 11
Reputation: 742
I found another useful solution here
If I have to use groupby
another approach can be:
def percentile(n):
def percentile_(x):
return np.percentile(x, n)
percentile_.__name__ = 'percentile_%s' % n
return percentile_
Using the below call, I am able to achieve the same result as the solution given by @TomAugspurger
df.groupby('C').agg([percentile(50), percentile(95)])
Upvotes: 21
Reputation: 28936
You want the quantile
method:
In [47]: df
Out[47]:
A B C
0 0.719391 0.091693 one
1 0.951499 0.837160 one
2 0.975212 0.224855 one
3 0.807620 0.031284 one
4 0.633190 0.342889 one
5 0.075102 0.899291 one
6 0.502843 0.773424 one
7 0.032285 0.242476 one
8 0.794938 0.607745 one
9 0.620387 0.574222 one
10 0.446639 0.549749 two
11 0.664324 0.134041 two
12 0.622217 0.505057 two
13 0.670338 0.990870 two
14 0.281431 0.016245 two
15 0.675756 0.185967 two
16 0.145147 0.045686 two
17 0.404413 0.191482 two
18 0.949130 0.943509 two
19 0.164642 0.157013 two
In [48]: df.groupby('C').quantile(.95)
Out[48]:
A B
C
one 0.964541 0.871332
two 0.826112 0.969558
Upvotes: 59