Reputation: 1057
I want to create a new column containing the corresponding value of the maximum in another column by group. This is best explained by example:
data = {'group':['g1', 'g1', 'g1', 'g1', 'g1', 'g1', 'g2', 'g2', 'g2', 'g2', 'g2'],
'A':[3, 1, 8, 2, 6, -1, 0, 13, -4, 0, 1],
'B':[5, 2, 3, 7, 11, -1, 4,-1, 1, 0, 2]}
df = pd.DataFrame(data)
df
The following solution works as a shortcut, but I have a feeling that there is a better way to do it:
df.loc[:, 'Amax'] = df.loc[df.groupby('group')['B'].idxmax(), 'A']
df.loc[:, 'Amax'] = df.groupby('group')['Amax'].transform('median')
group A B Amax
0 g1 3 5 6.0
1 g1 1 2 6.0
2 g1 8 3 6.0
3 g1 2 7 6.0
4 g1 6 11 6.0
5 g1 -1 -1 6.0
6 g2 0 4 0.0
7 g2 13 -1 0.0
8 g2 -4 1 0.0
9 g2 0 0 0.0
10 g2 1 2 0.0
Upvotes: 0
Views: 92
Reputation: 863791
Use DataFrame.set_index
with GroupBy.transform
, but because index is different assign array created by Series.to_numpy
:
df['Amax'] = df.set_index('A').groupby('group')['B'].transform('idxmax').to_numpy()
print(df)
group A B Amax
0 g1 3 5 6
1 g1 1 2 6
2 g1 8 3 6
3 g1 2 7 6
4 g1 6 11 6
5 g1 -1 -1 6
6 g2 0 4 0
7 g2 13 -1 0
8 g2 -4 1 0
9 g2 0 0 0
10 g2 1 2 0
Upvotes: 4
Reputation: 323396
Use transform
df['Amax']=df.loc[df.groupby('group')['B'].transform('idxmax'),'A'].values
df
Out[42]:
group A B Amax
0 g1 3 5 6
1 g1 1 2 6
2 g1 8 3 6
3 g1 2 7 6
4 g1 6 11 6
5 g1 -1 -1 6
6 g2 0 4 0
7 g2 13 -1 0
8 g2 -4 1 0
9 g2 0 0 0
10 g2 1 2 0
Upvotes: 6