Pandas Multiindex Groupby aggregate column with value from another column

Question

I have a pandas dataframe with multiindex where I want to aggregate the duplicate key rows as follows:

import numpy as np
import pandas as pd
df = pd.DataFrame({'S':[0,5,0,5,0,3,5,0],'Q':[6,4,10,6,2,5,17,4],'A':
                  ['A1','A1','A1','A1','A2','A2','A2','A2'],
                  'B':['B1','B1','B2','B2','B1','B1','B1','B2']})
df.set_index(['A','B'])

    Q  S
A  B        
A1 B1   6  0
   B1   4  5
   B2  10  0
   B2   6  5
A2 B1   2  0
   B1   5  3
   B1  17  5
   B2   4  0

and I would like to groupby this dataframe to aggregate the Q values (sum) and keep the S value that corresponds to the maximal row of the Q value yielding this:

df2 = pd.DataFrame({'S':[0,0,5,0],'Q':[10,16,24,4],'A':
                   ['A1','A1','A2','A2'],
                  'B':['B1','B2','B1','B2']})
df2.set_index(['A','B'])

        Q  S
A  B        
A1 B1  10  0
   B2  16  0
A2 B1  24  5
   B2   4  0

I tried the following, but it didn't work:

df.groupby(by=['A','B']).agg({'Q':'sum','S':df.S[df.Q.idxmax()]})

any hints?

Scott Boston · Accepted Answer

One way is to use agg, apply, and join:

g = df.groupby(['A','B'], group_keys=False)
g.apply(lambda x: x.loc[x.Q == x.Q.max(),['S']]).join(g.agg({'Q':'sum'}))

Output:

       S   Q
A  B        
A1 B1  0  10
   B2  0  16
A2 B1  5  24
   B2  0   4

Pandas Multiindex Groupby aggregate column with value from another column

Answers (2)

Related Questions