Reputation: 861
I have 40 or so excel documents and I want to read the first sheet into a Dataframe and then export the combined sheets to a csv file. The below code works so far, but I also need to add a column that has the imported sheet name. The sheet name is different for each workbook. I basically want to replace 'WorsksheetName' below with the actual sheet name.
import pandas as pd
import numpy as np
import glob
import openpyxl
glob.glob("..\*.xlsx")
all_data = pd.DataFrame()
for f in glob.glob("M:\Completed\*.xlsx"):
df = pd.read_excel(f,sheetname=1)
df['Sheet'] = 'WorksheetName'
all_data = all_data.append(df,ignore_index=True)
all_data.to_csv('Workoad.csv')
Upvotes: 0
Views: 3461
Reputation: 2369
You could use pd.ExcelFile
to retrieve the sheet name, as below.
import pandas as pd
import glob
all_data = []
for f in glob.glob("M:\Completed\*.xlsx"):
xl = pd.ExcelFile(f)
sheet = xl.sheet_names[0]
df = xl.parse(sheet)
df['Sheet'] = sheet
all_data.append(df)
pd.concat(all_data).to_csv('Workoad.csv')
The change to using pd.concat
instead of df.append
should speed things up some – this way you only need to build a new dataframe once, instead of forty times, once for each you add. I also changed the index selected. Feel free to change that to sheet = xl.sheet_names[1]
if you do in fact want the second sheet; python and its packages generally count from zero, even when working with programs that count from 1, like excel.
Upvotes: 0
Reputation: 10359
If you use the setting sheetname=None
, pandas imports all sheets of the workbook into a dictionary, where the key is the sheet name and the value is the dataframe of the worksheet itself. Using this, you could do the following:
import pandas as pd
import numpy as np
import glob
import openpyxl
all_data = pd.DataFrame()
for f in glob.glob("M:\Completed\*.xlsx"):
sheets_dict = pd.read_excel(f, sheetname=None)
for name, frame in sheets_dict.items():
frame['Sheet'] = name
all_data = all_data.append(frame, ignore_index=True)
all_data.to_csv('Workload.csv')
Your current setting seems to only keep the second sheet of the workbook - you could do this by using some kind of filter on name
.
Upvotes: 1