Reputation: 477
In a directory, I have several xlsx files, and I would like to put them in one single xlsx file. So I did this code:
all_data=pd.DataFrame()
for item_excell in arr2:
if item_excell.endswith('.xlsx'):
df=pd.read_excel(item_excell)
all_data=all_data.append(df, ignore_index=True)
all_data.to_excel('final_table.xlsx')
My previous excell have header, and I would like to keep them, because with this code, I don't get the header and the information gets mixed up.
Upvotes: 1
Views: 202
Reputation: 23099
as others have noticed your all_data
should be outside your for loop,
as taga has provided an answer, i'll show you how using Pathlib
which was added into Python in version 3.4 and allows for really high level interactions with your file-system with minimal code.
from pathlib import Path
your_dir = r'c:\path\to\excels' #r for windows.
now we make use of glob
which allows for wild-card matching to find all the excels in a given location using list comprehension
excels = [f for f in Path.(your_dir).glob('*.xlsx')]
this is the same as
excels = []
for file in Path.(your_dir).glob('*.xlsx'):
excels.append(file)
now you can concat your dfs
dfs = [pd.read_excel(frame) for frame in excels)]
final_df = pd.concat(dfs)
# do stuff
final_df.to_excel('file_name.xlsx',index=False)
Upvotes: 0
Reputation: 3895
Try this:
For example, If you have 3 .xlsx
files:
df1 = pd.read_excel('file 1.xlsx')
df2 = pd.read_excel('file 2.xlsx')
df3 = pd.read_excel('file 3.xlsx')
final_df = pd.concat([df1, df2, df3])
final_df.to_excel('combined file.xlsx')
Upvotes: 2