kms
kms

Reputation: 2024

Pandas Group by and create new column with 25th and 75th percentiles

I have the following pandas DataFrame:

df = pd.DataFrame({
                   'id': [1, 1, 1, 2],
                   'r': [1000, 1300, 1400, 1100],
                   's': [650, 720, 565, 600]
                 })

I'd like to aggregate the DataFrame and create a new column which is a range of r values - 25th and 75th percentiles. The aggregate for s columns is mean.

If there is only one observation for a group, then keep the observations as it is.

Expected output:

id  r               s

1   1075 - 1325     645
2   1100            600

Upvotes: 2

Views: 595

Answers (2)

Scott Boston
Scott Boston

Reputation: 153500

Option two:

g_id = df.groupby('id')

g_id['r'].quantile([.25, .76])\
         .unstack()\
         .assign(s=g_id['s'].agg('mean'))

Output:

      0.25    0.76      s
id                       
1   1150.0  1352.0  645.0
2   1100.0  1100.0  600.0

Details:

Create a groupby object g_id, which we will use a twice.

g_id['r'].quantile([.25,.75]) returns a multiindex Series with out level as id, and the inner level as the label for percentile 25 and 5. You can then unstack this inner level to create columns. Lastly, we assign a new column to this dataframe with the g_id of S aggregated using mean.

Upvotes: 3

mozway
mozway

Reputation: 261590

Here is one option, using Groupby.agg, quantile, and a regex.

NB. I am not fully sure which interpolation method you expect for the quantiles (check the linked documentation, there are several options).

import re

out = (df
 .groupby('id')
 .agg({'r': lambda x: re.sub(r'(\d+(?:\.\d+)?) - \1', r'\1',
                             x.quantile([0.25, 0.75])
                              .astype(str).str.cat(sep=' - ')),
      's': 'mean'})
 )

Output:

                  r      s
id                        
1   1150.0 - 1350.0  645.0
2            1100.0  600.0

Upvotes: 2

Related Questions