Reputation: 6869
I am trying to understand/use the groupby in Pandas to obtain the maximum count within each group and then return the row with the max count
df = pd.DataFrame({'name':['alpha','beta','gamma','delta','epsilon'], 'year':[2000,2001,2000,2001,2000], 'count':[1,25,11,20,8]})
df
count name year
0 1 alpha 2000
1 25 beta 2001
2 11 gamma 2000
3 20 delta 2001
4 8 epsilon 2000
df.groupby('year').max()
count name
year
2000 11 gamma
2001 25 delta
I am grouping by year but I'd like the output to look like:
count name
year
2000 11 gamma
2001 25 beta
So, I'd like to return the entire row from each group (by 'year') based on the maximum count. It also isn't clear to me why the year 2001 returns "delta" in the name rather than "beta" (as I would expect).
Upvotes: 1
Views: 354
Reputation: 879291
You can use idxmax
to collect the index of the maximum row, and then use df.iloc
to select the row:
In [73]: df.iloc[df.groupby('year')['count'].idxmax()].set_index('year')
Out[73]:
count name
year
2000 11 gamma
2001 25 beta
Note that df.groupby('year').max()
is finding the max
for each column -- both the count
and the name
. That is why delta
was getting selected as a max
in the name
column.
If you are only interested in the max count, then use
df.groupby('year')['count'].max()
df.groupby('year')
is a DataFrameGroupBy
object.
df.groupby('year')['count']
is a SeriesGroupBy
object. The call to max
or idxmax
on the SeriesGroupBy
object uses values only from the count
column. This method of restricting the GroupBy object is documented here.
Upvotes: 1
Reputation: 1
You can use the general groupby().apply
and return the rows whose count
is equal to the group's maximum count
. As in the other answer, the set_index
is used to get rid of the extra year
column.
>>> df = pd.DataFrame({'name':['alpha','beta','gamma','delta','epsilon'], 'year':[2000,2001,2000,2001,2000], 'count':[1,25,11,20,8]}) >>> df.groupby("year").apply(lambda g: g[g["count"] == g["count"].max()]).set_index("year") count name year 2000 11 gamma 2001 25 beta
Upvotes: 0