Pseudo
Pseudo

Reputation: 66

Grouping DataFrame to get max index

I'm having a dataframe similar to something like this:

    Year    class   Sales
0   1980    1   1.10
1   1980    2   7.07
2   1980    3   8.00
3   1980    4   12.00
4   1981    1   11.20
5   1981    1   2.00
6   1981    3   4.00
7   1981    2   6.00

I want my data to be grouped such that I get the yearly max sales of class and get the class of that dataframe. what I need to do after this?

data.groupby(['Year','class']).sum()
        Sales
Year    class   
1980    1   1.10
        2   7.07
        3   8.00
        4   12.00
1981    1   13.20
        2   6.00
        3   4.00

e.g. I want my output to be something like this:

       Sales
Year   class

1980   4     12.00
1981   1     13.20

I was able to get the correct output using loops but im trying to avoid that as it takes too much time.

Upvotes: 0

Views: 35

Answers (1)

DSM
DSM

Reputation: 353099

One approach would be that after you do your first groupby, you can do a second just on the Year (index level=0) to find the indices of the maximum sales:

In [41]: d2
Out[41]: 
            Sales
Year class       
1980 1       1.10
     2       7.07
     3       8.00
     4      12.00
1981 1      13.20
     2       6.00
     3       4.00

In [42]: d2["Sales"].groupby(level=0).idxmax()
Out[42]: 
Year
1980    (1980, 4)
1981    (1981, 1)
Name: Sales, dtype: object

In [43]: d2.loc[d2["Sales"].groupby(level=0).idxmax()]
Out[43]: 
            Sales
Year class       
1980 4       12.0
1981 1       13.2

Upvotes: 1

Related Questions