Reputation: 254
I have the following dataframe named df:
name score data1 data2
Name1 23 'a' 'b'
Name2 2 'c' 'd'
Name1 100 'b' 'f'
Name3 5 'a' 'f'
Name2 6 'e' 'g'
Name3 500 'f' 'n'
The desired result:
Find max score for each name and return the associated data (data1 and data 2) as a dict:
grouped_df = df.groupby('name').agg({'score':'max'})
This produces:
name score
Name1 100
Name2 6
Name3 500
But what I would ideally like to get is
name score data1 data2
Name1 100 'b' 'f'
Name2 6 'e' 'g'
Name3 500 'f' 'n'
Or:
dict = {'b':1, 'e':1,'f':2,'g':1}
I attempted joining df and grouped_df by 2 columns (name and score), but that doesn't keep only the max, it returns all the columns. Is there any acceptable solution for this using dataframes?
Upvotes: 2
Views: 2808
Reputation: 11573
You'll need to join the two dataframes
joinedData = grouped_df.join(df, ['name', 'score'])
Upvotes: 3