Reputation: 1201

Get percentiles from a grouped dataframe

I have a dataframe that has 2 experiment groups and I am trying to get percentile distributions. However, the data is already grouped:

df = pd.DataFrame({'group': ['control', 'control', 'control','treatment','treatment','treatment'],
               'month': [1,4,9,2,5,12],
               'ct': [8,4,2,5,5,7]})

I want to calculate which month is represents the 25th, 50th, 75th percentile of each group, but the dataframe is already grouped on group/month variables.

Update 1: I realize I didn't clarify the trouble I am running into. This is a grouped dataframe, so control, for example, has 8 data points where month = 1, 4 where month = 4, and 2 where month = 9. The following percentile values should be:

x = pd.Series([1,1,1,1,1,1,1,1,4,4,4,4,9,9)]
x.quantile([0.25,0.5,0.75])
>> 0.25    1.0
   0.50    1.0
   0.75    4.0
   dtype: float64

Grouping by group and taking quantiles doesn't provide an accurate answer. Is there a way to explode out the counts and take the percentiles of the ungrouped values? Final object should have these values:

             p25 p50 p75
control      1   1   4
treatment    2   5   12

Upvotes: 2

Answers (3)

BENY

Reputation: 323386

You may want to check describe

df.groupby('group').describe().stack()

Upvotes: 0

ALollz

Reputation: 59579

You can use Series.repeat and then get the quantiles:

df.groupby('group').apply(lambda x: (x.month.repeat(x.ct)).quantile([0.25, 0.5, 0.75])).rename_axis([None], axis=1)

           0.25  0.50  0.75
group                      
control     1.0   1.0   4.0
treatment   2.0   5.0  12.0

Upvotes: 1

Naga kiran

Reputation: 4607

You can try of using pd.quanitle with the required percentages as list

df.groupby('group').quantile([0.25,0.50,0.75])

Out:

                    ct  month
group           
control     0.25    3.0 2.5
            0.50    4.0 4.0
            0.75    6.0 6.5
treatment   0.25    5.0 3.5
            0.50    5.0 5.0
            0.75    6.0 8.5

Upvotes: 1

Get percentiles from a grouped dataframe

Answers (3)

Related Questions