Reputation: 66
I'm having a dataframe similar to something like this:
Year class Sales
0 1980 1 1.10
1 1980 2 7.07
2 1980 3 8.00
3 1980 4 12.00
4 1981 1 11.20
5 1981 1 2.00
6 1981 3 4.00
7 1981 2 6.00
I want my data to be grouped such that I get the yearly max sales of class and get the class of that dataframe. what I need to do after this?
data.groupby(['Year','class']).sum()
Sales
Year class
1980 1 1.10
2 7.07
3 8.00
4 12.00
1981 1 13.20
2 6.00
3 4.00
e.g. I want my output to be something like this:
Sales
Year class
1980 4 12.00
1981 1 13.20
I was able to get the correct output using loops but im trying to avoid that as it takes too much time.
Upvotes: 0
Views: 35
Reputation: 353099
One approach would be that after you do your first groupby, you can do a second just on the Year (index level=0) to find the indices of the maximum sales:
In [41]: d2
Out[41]:
Sales
Year class
1980 1 1.10
2 7.07
3 8.00
4 12.00
1981 1 13.20
2 6.00
3 4.00
In [42]: d2["Sales"].groupby(level=0).idxmax()
Out[42]:
Year
1980 (1980, 4)
1981 (1981, 1)
Name: Sales, dtype: object
In [43]: d2.loc[d2["Sales"].groupby(level=0).idxmax()]
Out[43]:
Sales
Year class
1980 4 12.0
1981 1 13.2
Upvotes: 1