ah bon
ah bon

Reputation: 10051

Read excel and reformat the multi-index headers in Pandas

Given a excel file with format as follows:

enter image description here

Reading with pd.read_clipboard, I get:

   year      2018 Unnamed: 2      2019 Unnamed: 4
0  city  quantity      price  quantity      price
1    bj        10          2         4          7
2    sh         6          8         3          4

Just wondering if it's possible to convert to the following format with Pandas:

   year city  quantity  price
0  2018   bj        10      2
1  2019   bj         4      7
2  2018   sh         6      8
3  2019   sh         3      4

Upvotes: 1

Views: 1828

Answers (1)

jezrael
jezrael

Reputation: 863266

I think here is best convert excel file to DataFrame with MultiIndex in columns and first column as index:

df = pd.read_excel(file, header=[0,1], index_col=[0])

print (df)
year     2018           2019      
city quantity price quantity price
bj         10     2        4     7
sh          6     8        3     4

print (df.columns)
MultiIndex([('2018', 'quantity'),
            ('2018',    'price'),
            ('2019', 'quantity'),
            ('2019',    'price')],
           names=['year', 'city'])

Then reshape by DataFrame.stack, change order of levels by DataFrame.swaplevel, set index and columns names by DataFrame.rename_axis and last convert index to columns, and if encessary convert year to integers:

df1 = (df.stack(0)
         .swaplevel(0,1)
         .rename_axis(index=['year','city'], columns=None)
         .reset_index()
         .assign(year=lambda x: x['year'].astype(int)))
print (df1)
   year city  price  quantity
0  2018   bj      2        10
1  2019   bj      7         4
2  2018   sh      8         6
3  2019   sh      4         3

Upvotes: 4

Related Questions