Export Dataframe with Sheetname as Column

Question

I have 40 or so excel documents and I want to read the first sheet into a Dataframe and then export the combined sheets to a csv file. The below code works so far, but I also need to add a column that has the imported sheet name. The sheet name is different for each workbook. I basically want to replace 'WorsksheetName' below with the actual sheet name.

import pandas as pd
import numpy as np
import glob 
import openpyxl
glob.glob("..\*.xlsx")
all_data = pd.DataFrame()
for f in glob.glob("M:\Completed\*.xlsx"):
        df = pd.read_excel(f,sheetname=1)
        df['Sheet'] = 'WorksheetName'
        all_data = all_data.append(df,ignore_index=True)
all_data.to_csv('Workoad.csv')

asongtoruin · Accepted Answer

If you use the setting sheetname=None, pandas imports all sheets of the workbook into a dictionary, where the key is the sheet name and the value is the dataframe of the worksheet itself. Using this, you could do the following:

import pandas as pd
import numpy as np
import glob 
import openpyxl

all_data = pd.DataFrame()
for f in glob.glob("M:\Completed\*.xlsx"):
    sheets_dict = pd.read_excel(f, sheetname=None)
    for name, frame in sheets_dict.items():
        frame['Sheet'] = name
        all_data = all_data.append(frame, ignore_index=True)

all_data.to_csv('Workload.csv')

Your current setting seems to only keep the second sheet of the workbook - you could do this by using some kind of filter on name.

Export Dataframe with Sheetname as Column

Answers (2)

Related Questions