Recombine sheets with multiple level of headers, and write out an excel file with different sheets named by one level of header

Question

Suppose we have an excel file (you can download sample data from this link), which contains several sheets with three-layer headers. At present, the data is divided into different sheets according to the Indicator Name.

      Country Name             Mexico            Moldova
0   Indicator Name            PPP_GDP            PPP_GDP
1   Indicator Code  NY.GDP.PCAP.PP.CD  NY.GDP.PCAP.PP.CD
2             1988                NaN                NaN
3             1989               8.9                 NaN
4             1990               9.9              102.5 
5             1991               9.9              103.4 
6             1992               9.8              105.4 
7             1993               9.7              101.6 
8             1994               9.7              101.2 
9             1995               9.6              100.2 
10            1996               9.5               99.8 
11            1997               9.4               99.2 
12            1998               9.2               99.3 
13            1999               9.0               99.4 
14            2000                NaN                NaN

Now I hope to re-arrange them into different sheets according to the Country Name, how to do it?

My attempt is to loop each sheet, merge them horizontally into a large sheet, and then regroup it according to the Country Name.

sheets = [PPP_GDP, CPI, PPI]
final = []
for sheet in sheets:
    df = pd.read_excel('./test_data_2022-10-25.xlsx', sheet_name=sheet, header=[0, 1, 2])
    final.append(df)
    
excel_merged = pd.concat(final, ignore_index=True)
excel_merged.to_excel('./output.xlsx')

Out:

So the question is how could we split dataframe to four sheets as follows:

Note that I gave the expected result in the sample data: a new excel file containing Mexico, Moldova, Nepal and Israel as sheets.

jezrael · Accepted Answer

Create index by first column by index_col=0 in read_excel, join by pd.concat with axis=1 and last create sheetnames by country names form first level of MultiIndex:

sheets = ['PPP_GDP', 'CPI', 'PPI']
df = pd.concat([pd.read_excel('./test_data_2022-10-25.xlsx', 
                              sheet_name=sheet, 
                              header=[0, 1, 2], 
                              index_col=0) 
                 for sheet in sheets], axis=1)


writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')

for name, g in df.groupby(level=0, axis=1):
    g.to_excel(writer, sheet_name=name)

writer.save()

Recombine sheets with multiple level of headers, and write out an excel file with different sheets named by one level of header

Answers (1)

Related Questions