How to get the max out of a group by on two columns and sum on third in a pandas dataframe?

Question

So I used a group by on a pandas dataframe which looks like this

df.groupby(['year','month'])['AMT'].agg('sum')

And I get something like this

year  month
2003  1            114.00
      2           9195.00
      3            300.00
      5            200.00
      6            450.00
      7             68.00
      8            750.00
      9           3521.00
      10           250.00
      11           799.00
      12          1000.00
2004  1           8551.00
      2           9998.00
      3          17334.00
      4           2525.00
      5          16014.00
      6           9132.00
      7          10623.00
      8           7538.00
      9           3650.00
      10          7733.00
      11         10128.00
      12          4741.00
2005  1           6965.00
      2           3208.00
      3           8630.00
      4           7776.00
      5          11950.00
      6          11717.00
      7           1510.00
              ...    
2015  7        1431441.00
      8         966974.00
      9        1121650.00
      10       1200104.00
      11       1312191.90
      12        482535.00
2016  1        1337343.00
      2        1465068.00
      3        1170113.00
      4        1121691.00
      5        1302936.00
      6        1518047.00
      7        1251844.00
      8         825215.00
      9        1491626.00
      10       1243877.00
      11       1632252.00
      12        750995.50
2017  1         905974.00
      2        1330182.00
      3        1382628.52
      4        1146789.00
      5        1201425.00
      6        1278701.00
      7        1172596.00
      8        1517116.50
      9        1108609.00
      10       1360841.00
      11       1340386.00
      12        860686.00

What I want is to just select the max out of the third summed column so that the final data frame has only the max from each year, something like:

year  month
2003      2           9195.00
2004      3          17334.00
2005      5          11950.00

... and so on

What do I have to add to my group by aggregation to do this?

jezrael · Accepted Answer

I think need DataFrameGroupBy.idxmax:

s = df.groupby(['year','month'])['AMT'].sum()
out = s.loc[s.groupby(level=0).idxmax()]
#working in newer pandas versions
#out = df.loc[df.groupby('Year').idxmax()]
print (out)
Year  month
2003  2         9195.0
2004  3        17334.0
2005  5        11950.0
Name: AMT, dtype: float64

If possible multiple max values per years:

out = s[s == s.groupby(level=0).transform('max')]
print (out)
Year  month
2003  2         9195.0
2004  3        17334.0
2005  5        11950.0
Name: AMT, dtype: float64

How to get the max out of a group by on two columns and sum on third in a pandas dataframe?

Answers (2)

Related Questions