Reputation: 1322
I'm attempting to groupby A, find the max of B, and for that max return the C associated with it.
import pandas as pd
a1 = [1,2,'x']
a2 = [1,3,'y']
a3 = [2,3,'y']
a4 = [2,4,'z']
df = pd.DataFrame([a1,a2,a3,a4], columns = ['A','B','C'])
groups = df.groupby('A')['B'].max()
df = df.join(groups, on='A', rsuffix = '_max')
df
result i get is
A B C B_max
0 1 2 x 3
1 1 3 y 3
2 2 3 y 4
3 2 4 z 4
result i want is
A B C B_max C_max
0 1 2 x 3 y
1 1 3 y 3 y
2 2 3 y 4 z
3 2 4 z 4 z
Upvotes: 3
Views: 155
Reputation: 13757
Well here are a few options (no doubt there are more). (1) This one is just a wrinkle on what you're doing (it grabs the C_max column by using the idxmax
of column B):
>>> foo = lambda x: pd.Series({'B_max':x.B.max(),'C_max':x.C[x.B.idxmax()]})
>>> print df.join(df.groupby('A').apply(foo),on='A')
A B C B_max C_max
0 1 2 x 3 y
1 1 3 y 3 y
2 2 3 y 4 z
3 2 4 z 4 z
(2) Or maybe this is cleaner:
>>> agg = df.groupby('A').agg(lambda x: x.loc[x.B.idxmax(),:])
>>> print df.join(agg, on='A', rsuffix = '_max')
A B C B_max C_max
0 1 2 x 3 y
1 1 3 y 3 y
2 2 3 y 4 z
3 2 4 z 4 z
(3) Or you could do it all in a function that was called by the groupby/apply
:
def foo(g):
g['B_max'] = g.B.max()
g['C_max'] = g.C[g.B.idxmax()]
return g
print df.groupby('A').apply(foo)
A B C B_max C_max
0 1 2 x 3 y
1 1 3 y 3 y
2 2 3 y 4 z
3 2 4 z 4 z
Upvotes: 1