Reputation: 1527
I know this type of question is asked all the time. But I am having trouble figuring out the best way to do this.
I wrote a script that reformats a single excel file using pandas. It works great.
Now I want to loop through multiple excel files, preform the same reformat, and place the newly reformatted data from each excel sheet at the bottom, one after another.
I believe the first step is to make a list of all excel files in the directory. There are so many different ways to do this so I am having trouble finding the best way.
Below is the code I currently using to import multiple .xlsx and create a list.
import os
import glob
os.chdir('C:\ExcelWorkbooksFolder')
for FileList in glob.glob('*.xlsx'):
print(FileList)
I am not sure if the previous glob code actually created the list that I need.
Then I have trouble understanding where to go from there.
The code below fails at pd.ExcelFile(File)
I beleive I am missing something....
# create for loop
for File in FileList:
for x in File:
# Import the excel file and call it xlsx_file
xlsx_file = pd.ExcelFile(File)
xlsx_file
# View the excel files sheet names
xlsx_file.sheet_names
# Load the xlsx files Data sheet as a dataframe
df = xlsx_file.parse('Data',header= None)
# select important rows,
df_NoHeader = df[4:]
#then It does some more reformatting.
'
Any help is greatly appreciated
Upvotes: 1
Views: 31013
Reputation: 1527
I solved my problem. Instead of using the glob function I used the os.listdir to read all my excel sheets, loop through each excel file, reformat, then append the final data to the end of the table.
#first create empty appended_data table to store the info.
appended_data = []
for WorkingFile in os.listdir('C:\ExcelFiles'):
if os.path.isfile(WorkingFile):
# Import the excel file and call it xlsx_file
xlsx_file = pd.ExcelFile(WorkingFile)
# View the excel files sheet names
xlsx_file.sheet_names
# Load the xlsx files Data sheet as a dataframe
df = xlsx_file.parse('sheet1',header= None)
#.... do so reformating, call finished sheet reformatedDataSheet
reformatedDataSheet
appended_data.append(reformatedDataSheet)
appended_data = pd.concat(appended_data)
And thats it, it does everything I wanted.
Upvotes: 4
Reputation: 12391
you need to change
os.chdir('C:\ExcelWorkbooksFolder')
for FileList in glob.glob('*.xlsx'):
print(FileList)
to just
os.chdir('C:\ExcelWorkbooksFolder')
FileList = glob.glob('*.xlsx')
print(FileList)
Why does this fix it? glob
returns a single list. Since you put for FileList in glob.glob(...)
, you're going to walk that list one by one and put the result into FileList
. At the end of your loop, FileList
is a single filename - a single string.
When you do this code:
for File in FileList:
for x in File:
the first line will assign File
to the first character of the last filename (as a string). The second line will assign x
to the first (and only) character of File
. This is not likely to be a valid filename, so it throws an error.
Upvotes: 3