Reputation: 161
I have a dataframe that contains some information about users. There is a column for user, column for type, and column for count, like this:
name type count
robert x 123
robert y 456
robert z 5123
charlie x 442123
charlie y 0
charlie z 42
I'm trying to figure out which type has the highest count per name, so for this case, I would want to select this:
name type count
robert z 5123
charlie x 442123
I know I can do something like this to get the max count per name, but I'm not sure how I can include the "type" column, which is actually the most important
df.sort_values('count', ascending=False).drop_duplicates('name').sort_index()
Upvotes: 10
Views: 15913
Reputation: 7942
(Very slightly) better yet, you can do:
df.loc[df.groupby('name')['count'].idxmax(),'type']
Upvotes: 0
Reputation: 91
Just sort on name and count, group by name and keep first.
df.sort_values(['name', 'count'],ascending=False).groupby(['name']).first().reset_index()
will give you:
name type count
3 charlie x 442123
2 robert z 5123
Upvotes: 5
Reputation: 3937
Try this
df.loc[df.groupby('name')['count'].idxmax()]['type']
name type count
3 charlie x 442123
2 robert z 5123
In case you want not just a single max value but the top n
values per group you can do (e.g. n = 2
)
df.loc[df.groupby('name')['count'].nlargest(2).index.get_level_values(1)]
name type count
3 charlie x 442123
5 charlie z 42
2 robert z 5123
1 robert y 456
Upvotes: 11
Reputation: 153560
What if you have two maxes for a name with different types:
print(df)
name type count
0 robert x 123
1 robert y 456
2 robert z 5123
3 robert a 5123
4 charlie x 442123
5 charlie y 0
6 charlie z 42
Use boolean indexing:
df[df['count'] == df.groupby('name')['count'].transform('max')]
Output:
name type count
2 robert z 5123
3 robert a 5123
4 charlie x 442123
Upvotes: 4