Reputation: 193
I have a DataFrame that looks like this...
idn value
0 ID1 25
1 ID1 30
2 ID2 30
3 ID2 50
I want to add another column to this frame that is the max 'value' grouped by 'idn'
I want a result that looks like this.
idn value max_val
0 ID1 25 30
1 ID1 30 30
2 ID2 30 50
3 ID2 50 50
I can extract the max of 'value' using a group by like so...
df[['idn', 'value']].groupby('idn')['value'].max()
However, I am unable to merge that result back into the original DataFrame.
What is the best way to get the desired result?
Thank You
Upvotes: 18
Views: 12218
Reputation: 1
For anyone else that is interested. The one-liner method for this is to use reset_index.
as such:
df['value_max'] = df[['idn', 'value']].groupby('idn')['value'].max().reset_index(['idn', 'value'], drop = True)
Upvotes: 0
Reputation: 68116
Use the transform
method on a groupby object:
In [5]: df['maxval'] = df.groupby(by=['idn']).transform('max')
In [6]: df
Out[6]:
idn value maxval
0 ID1 25 30
1 ID1 30 30
2 ID2 30 50
3 ID2 50 50
Upvotes: 20
Reputation: 28233
set the index of df
to idn
, and then use df.merge
. after the merge, reset the index and rename columns
dfmax = df.groupby('idn')['value'].max()
df.set_index('idn', inplace=True)
df = df.merge(dfmax, how='outer', left_index=True, right_index=True)
df.reset_index(inplace=True)
df.columns = ['idn', 'value', 'max_value']
Upvotes: 2