Reputation: 153
Using Python, I want to merge all .xls files in a directory into one data frame and save that as a new concatenated .xls file. The .xls files will have an unknown number of columns and no consistent headers.
I've used other suggestions on this forum and ended up with this:
import os
import pandas as pd
path = os.getcwd()
files = os.listdir(path)
files_xls = [f for f in files if f[-3:] == 'xls']
df = pd.DataFrame()
for f in files_xls:
data = pd.read_excel(f for f in files_xls) # I dont understand what to add
# in the parentheses here.
df = df.append(data)
df
I'm getting these errors:
File "<ipython-input-17-bb67a423cf40>", line 14, in <module>
data = pd.read_excel(f for f in files_xls)
File "C:\Users\xxxx\Anaconda2\lib\site-packages\pandas\io\excel.py", line 170, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Users\xxxx\Anaconda2\lib\site-packages\pandas\io\excel.py", line 229, in __init__
raise ValueError('Must explicitly set engine if not passing in'
ValueError: Must explicitly set engine if not passing in buffer or path for io.
Upvotes: 0
Views: 113
Reputation: 19395
try this brother
df = []
for f in files_xls:
data = pd.read_excel(f)
df = df.append(data)
mydf = pd.concat(df, axis = 0)
Upvotes: 1