Dread
Dread

Reputation: 861

Export Dataframe with Sheetname as Column

I have 40 or so excel documents and I want to read the first sheet into a Dataframe and then export the combined sheets to a csv file. The below code works so far, but I also need to add a column that has the imported sheet name. The sheet name is different for each workbook. I basically want to replace 'WorsksheetName' below with the actual sheet name.

import pandas as pd
import numpy as np
import glob 
import openpyxl
glob.glob("..\*.xlsx")
all_data = pd.DataFrame()
for f in glob.glob("M:\Completed\*.xlsx"):
        df = pd.read_excel(f,sheetname=1)
        df['Sheet'] = 'WorksheetName'
        all_data = all_data.append(df,ignore_index=True)
all_data.to_csv('Workoad.csv')

Upvotes: 0

Views: 3461

Answers (2)

EFT
EFT

Reputation: 2369

You could use pd.ExcelFile to retrieve the sheet name, as below.

import pandas as pd
import glob 
all_data = []
for f in glob.glob("M:\Completed\*.xlsx"):
        xl = pd.ExcelFile(f)
        sheet = xl.sheet_names[0]
        df = xl.parse(sheet)
        df['Sheet'] = sheet
        all_data.append(df)
pd.concat(all_data).to_csv('Workoad.csv')

The change to using pd.concat instead of df.append should speed things up some – this way you only need to build a new dataframe once, instead of forty times, once for each you add. I also changed the index selected. Feel free to change that to sheet = xl.sheet_names[1] if you do in fact want the second sheet; python and its packages generally count from zero, even when working with programs that count from 1, like excel.

Upvotes: 0

asongtoruin
asongtoruin

Reputation: 10359

If you use the setting sheetname=None, pandas imports all sheets of the workbook into a dictionary, where the key is the sheet name and the value is the dataframe of the worksheet itself. Using this, you could do the following:

import pandas as pd
import numpy as np
import glob 
import openpyxl

all_data = pd.DataFrame()
for f in glob.glob("M:\Completed\*.xlsx"):
    sheets_dict = pd.read_excel(f, sheetname=None)
    for name, frame in sheets_dict.items():
        frame['Sheet'] = name
        all_data = all_data.append(frame, ignore_index=True)

all_data.to_csv('Workload.csv')

Your current setting seems to only keep the second sheet of the workbook - you could do this by using some kind of filter on name.

Upvotes: 1

Related Questions