brandog
brandog

Reputation: 1527

Looping through multiple excel files in python using pandas

I know this type of question is asked all the time. But I am having trouble figuring out the best way to do this.

I wrote a script that reformats a single excel file using pandas. It works great.

Now I want to loop through multiple excel files, preform the same reformat, and place the newly reformatted data from each excel sheet at the bottom, one after another.

I believe the first step is to make a list of all excel files in the directory. There are so many different ways to do this so I am having trouble finding the best way.

Below is the code I currently using to import multiple .xlsx and create a list.

import os
import glob

os.chdir('C:\ExcelWorkbooksFolder')
for FileList in glob.glob('*.xlsx'):
         print(FileList)

I am not sure if the previous glob code actually created the list that I need.

Then I have trouble understanding where to go from there. The code below fails at pd.ExcelFile(File) I beleive I am missing something....

# create for loop
for File in FileList:
    for x in File:
# Import the excel file and call it xlsx_file
xlsx_file = pd.ExcelFile(File)
xlsx_file
# View the excel files sheet names
xlsx_file.sheet_names
# Load the xlsx files Data sheet as a dataframe
df = xlsx_file.parse('Data',header= None)
# select important rows,
df_NoHeader = df[4:]
#then It does some more reformatting.
'

Any help is greatly appreciated

Upvotes: 1

Views: 31013

Answers (2)

brandog
brandog

Reputation: 1527

I solved my problem. Instead of using the glob function I used the os.listdir to read all my excel sheets, loop through each excel file, reformat, then append the final data to the end of the table.

#first create empty appended_data table to store the info.
appended_data = []


for WorkingFile in os.listdir('C:\ExcelFiles'):
     if os.path.isfile(WorkingFile):

        # Import the excel file and call it xlsx_file
        xlsx_file = pd.ExcelFile(WorkingFile)
        # View the excel files sheet names
        xlsx_file.sheet_names
        # Load the xlsx files Data sheet as a dataframe
        df = xlsx_file.parse('sheet1',header= None)

        #.... do so reformating, call finished sheet reformatedDataSheet
        reformatedDataSheet
        appended_data.append(reformatedDataSheet)
appended_data = pd.concat(appended_data)

And thats it, it does everything I wanted.

Upvotes: 4

Corley Brigman
Corley Brigman

Reputation: 12391

you need to change

os.chdir('C:\ExcelWorkbooksFolder')
for FileList in glob.glob('*.xlsx'):
         print(FileList)

to just

os.chdir('C:\ExcelWorkbooksFolder')
FileList = glob.glob('*.xlsx')
print(FileList)

Why does this fix it? glob returns a single list. Since you put for FileList in glob.glob(...), you're going to walk that list one by one and put the result into FileList. At the end of your loop, FileList is a single filename - a single string.

When you do this code:

for File in FileList:
    for x in File:

the first line will assign File to the first character of the last filename (as a string). The second line will assign x to the first (and only) character of File. This is not likely to be a valid filename, so it throws an error.

Upvotes: 3

Related Questions