rainbow12
rainbow12

Reputation: 477

Merging several xlsx files into one

In a directory, I have several xlsx files, and I would like to put them in one single xlsx file. So I did this code:

all_data=pd.DataFrame()
for item_excell in arr2:
    if item_excell.endswith('.xlsx'):
        df=pd.read_excel(item_excell)
        all_data=all_data.append(df, ignore_index=True)

 all_data.to_excel('final_table.xlsx')

My previous excell have header, and I would like to keep them, because with this code, I don't get the header and the information gets mixed up.

Upvotes: 1

Views: 202

Answers (2)

Umar.H
Umar.H

Reputation: 23099

as others have noticed your all_data should be outside your for loop,

as taga has provided an answer, i'll show you how using Pathlib which was added into Python in version 3.4 and allows for really high level interactions with your file-system with minimal code.

from pathlib import Path
your_dir = r'c:\path\to\excels' #r for windows.

now we make use of glob which allows for wild-card matching to find all the excels in a given location using list comprehension

excels = [f for f in Path.(your_dir).glob('*.xlsx')] 

this is the same as

excels = []
for file in Path.(your_dir).glob('*.xlsx'):
    excels.append(file) 

now you can concat your dfs

dfs = [pd.read_excel(frame) for frame in excels)]
final_df = pd.concat(dfs)
# do stuff
final_df.to_excel('file_name.xlsx',index=False)

Upvotes: 0

taga
taga

Reputation: 3895

Try this:

For example, If you have 3 .xlsx files:

df1 = pd.read_excel('file 1.xlsx')
df2 = pd.read_excel('file 2.xlsx')
df3 = pd.read_excel('file 3.xlsx')

final_df = pd.concat([df1, df2, df3])
final_df.to_excel('combined file.xlsx')

Upvotes: 2

Related Questions