Looping through multiple excel files in python using pandas

Question

I know this type of question is asked all the time. But I am having trouble figuring out the best way to do this.

I wrote a script that reformats a single excel file using pandas. It works great.

Now I want to loop through multiple excel files, preform the same reformat, and place the newly reformatted data from each excel sheet at the bottom, one after another.

I believe the first step is to make a list of all excel files in the directory. There are so many different ways to do this so I am having trouble finding the best way.

Below is the code I currently using to import multiple .xlsx and create a list.

import os
import glob

os.chdir('C:\ExcelWorkbooksFolder')
for FileList in glob.glob('*.xlsx'):
         print(FileList)

I am not sure if the previous glob code actually created the list that I need.

Then I have trouble understanding where to go from there. The code below fails at pd.ExcelFile(File) I beleive I am missing something....

# create for loop
for File in FileList:
    for x in File:
# Import the excel file and call it xlsx_file
xlsx_file = pd.ExcelFile(File)
xlsx_file
# View the excel files sheet names
xlsx_file.sheet_names
# Load the xlsx files Data sheet as a dataframe
df = xlsx_file.parse('Data',header= None)
# select important rows,
df_NoHeader = df[4:]
#then It does some more reformatting.
'

Any help is greatly appreciated

brandog · Accepted Answer

I solved my problem. Instead of using the glob function I used the os.listdir to read all my excel sheets, loop through each excel file, reformat, then append the final data to the end of the table.

#first create empty appended_data table to store the info.
appended_data = []


for WorkingFile in os.listdir('C:\ExcelFiles'):
     if os.path.isfile(WorkingFile):

        # Import the excel file and call it xlsx_file
        xlsx_file = pd.ExcelFile(WorkingFile)
        # View the excel files sheet names
        xlsx_file.sheet_names
        # Load the xlsx files Data sheet as a dataframe
        df = xlsx_file.parse('sheet1',header= None)

        #.... do so reformating, call finished sheet reformatedDataSheet
        reformatedDataSheet
        appended_data.append(reformatedDataSheet)
appended_data = pd.concat(appended_data)

And thats it, it does everything I wanted.

Looping through multiple excel files in python using pandas

Answers (2)

Related Questions