idxmax is not working in case of pivot_table - Pandas

Question

Dataset looks like this (within pandas dataframe)

   Month  Year  Money
0    Jan  2002    615
1    Feb  2002    756
2    Mar  2002    455
3    Apr  2002    645
4    May  2002    669
5    Jun  2002    913
6    Jul  2002    157
7    Aug  2002    217
8    Sep  2002    985
9    Oct  2002    321
10   Nov  2002    847
11   Dec  2002    179
12   Jan  2003    329
13   Feb  2003    717
14   Mar  2003    278
15   Apr  2003    709
16   May  2003    995

So, I tried pivot

data = df.pivot('Month', 'Year', 'Money')

Got result like this:

Year   2002  2003  2004  2005
Month                        
Apr     645   709   178   800
Aug     217   867   515   748
Dec     179   230   121   905
Feb     756   717   879   772
Jan     615   329   896   108
Jul     157   391   429   699
Jun     913   887   422   537
Mar     455   278   934   906
May     669   995   726   324
Nov     847   536   151   195
Oct     321   950   278   173
Sep     985   459   915   437

The intension is to assign 'Year' with highest value in a separate column.

So, I tried this.

data['Max'] = data[['2002, 2003, 2004, 2005']].idxmax(axis=1)

This worked before on simple dataframe. But after applying pivot it's showing me this:

KeyError                                  Traceback (most recent call last)
 in ()
----> 1 data['Max'] = data[['2002, 2003, 2004, 2005']].idxmax(axis=1)
      2 data.head()

2 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1638             if missing == len(indexer):
   1639                 axis_name = self.obj._get_axis_name(axis)
-> 1640                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1641 
   1642             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Index(['2002, 2003, 2004, 2005'], dtype='object', name='Year')] are in the [columns]"

So, I tried resetting the index by

data=pd.pivot_table(df,index=['Month'],columns='Year',values='Money',aggfunc='sum').reset_index()

Result looks like this:

Year Month  2002  2003  2004  2005
0      Apr   645   709   178   800
1      Aug   217   867   515   748
2      Dec   179   230   121   905
3      Feb   756   717   879   772
4      Jan   615   329   896   108
5      Jul   157   391   429   699
6      Jun   913   887   422   537
7      Mar   455   278   934   906
8      May   669   995   726   324
9      Nov   847   536   151   195
10     Oct   321   950   278   173
11     Sep   985   459   915   437

Then again I applied the same code:

data['Max'] = data[['2002, 2003, 2004, 2005']].idxmax(axis=1)

Same Error!

KeyError: "None of [Index(['2002, 2003, 2004, 2005'], dtype='object', name='Year')] are in the [columns]"

print(data.columns) showed Index(['Month', 2002, 2003, 2004, 2005], dtype='object', name='Year')

What did I miss?

Quang Hoang · Accepted Answer

I think you want:

data['Max'] = data.idxmax(axis=1)

Or if you want specific year:

data['Max'] = data[[2002,2003,2004,2005]].idxmax(axis=1)

if your Year are integers, else:

data['Max'] = data[['2002','2003','2004','2005']].idxmax(axis=1)

Instead of indexing with a big string '2002, 2003, 2004, 2005'.

Output:

Year   2002  2003  2004  2005   Max
Month                              
Apr     645   709   178   800  2005
Aug     217   867   515   748  2003
Dec     179   230   121   905  2005
Feb     756   717   879   772  2004
Jan     615   329   896   108  2004
Jul     157   391   429   699  2005
Jun     913   887   422   537  2002
Mar     455   278   934   906  2004
May     669   995   726   324  2003
Nov     847   536   151   195  2002
Oct     321   950   278   173  2003
Sep     985   459   915   437  2002

idxmax is not working in case of pivot_table - Pandas

Answers (1)

Related Questions