Reputation: 1201
I have a dataframe that has 2 experiment groups and I am trying to get percentile distributions. However, the data is already grouped:
df = pd.DataFrame({'group': ['control', 'control', 'control','treatment','treatment','treatment'],
'month': [1,4,9,2,5,12],
'ct': [8,4,2,5,5,7]})
I want to calculate which month is represents the 25th, 50th, 75th percentile of each group, but the dataframe is already grouped on group/month variables.
Update 1: I realize I didn't clarify the trouble I am running into. This is a grouped dataframe, so control, for example, has 8 data points where month = 1, 4 where month = 4, and 2 where month = 9. The following percentile values should be:
x = pd.Series([1,1,1,1,1,1,1,1,4,4,4,4,9,9)]
x.quantile([0.25,0.5,0.75])
>> 0.25 1.0
0.50 1.0
0.75 4.0
dtype: float64
Grouping by group and taking quantiles doesn't provide an accurate answer. Is there a way to explode out the counts and take the percentiles of the ungrouped values? Final object should have these values:
p25 p50 p75
control 1 1 4
treatment 2 5 12
Upvotes: 2
Views: 1654
Reputation: 323226
You may want to check describe
df.groupby('group').describe().stack()
Upvotes: 0
Reputation: 59519
You can use Series.repeat
and then get the quantiles:
df.groupby('group').apply(lambda x: (x.month.repeat(x.ct)).quantile([0.25, 0.5, 0.75])).rename_axis([None], axis=1)
0.25 0.50 0.75
group
control 1.0 1.0 4.0
treatment 2.0 5.0 12.0
Upvotes: 1
Reputation: 4607
You can try of using pd.quanitle
with the required percentages as list
df.groupby('group').quantile([0.25,0.50,0.75])
Out:
ct month
group
control 0.25 3.0 2.5
0.50 4.0 4.0
0.75 6.0 6.5
treatment 0.25 5.0 3.5
0.50 5.0 5.0
0.75 6.0 8.5
Upvotes: 1