Reputation: 2024
I have the following pandas DataFrame:
df = pd.DataFrame({
'id': [1, 1, 1, 2],
'r': [1000, 1300, 1400, 1100],
's': [650, 720, 565, 600]
})
I'd like to aggregate the DataFrame and create a new column which is a range of r
values - 25th
and 75th
percentiles. The aggregate for s
columns is mean
.
If there is only one observation for a group, then keep the observations as it is.
Expected output:
id r s
1 1075 - 1325 645
2 1100 600
Upvotes: 2
Views: 595
Reputation: 153500
Option two:
g_id = df.groupby('id')
g_id['r'].quantile([.25, .76])\
.unstack()\
.assign(s=g_id['s'].agg('mean'))
Output:
0.25 0.76 s
id
1 1150.0 1352.0 645.0
2 1100.0 1100.0 600.0
Details:
Create a groupby object g_id, which we will use a twice.
g_id['r'].quantile([.25,.75]) returns a multiindex Series with out level as id, and the inner level as the label for percentile 25 and 5. You can then unstack this inner level to create columns. Lastly, we assign a new column to this dataframe with the g_id of S aggregated using mean.
Upvotes: 3
Reputation: 261590
Here is one option, using Groupby.agg
, quantile
, and a regex.
NB. I am not fully sure which interpolation method you expect for the quantiles (check the linked documentation, there are several options).
import re
out = (df
.groupby('id')
.agg({'r': lambda x: re.sub(r'(\d+(?:\.\d+)?) - \1', r'\1',
x.quantile([0.25, 0.75])
.astype(str).str.cat(sep=' - ')),
's': 'mean'})
)
Output:
r s
id
1 1150.0 - 1350.0 645.0
2 1100.0 600.0
Upvotes: 2