Reputation: 67
So I tried combining multiple excel (around 30 different ones) into one single excel using python. Most of the excels looks something like this:
title1 title2 title3 title4
name price gender true or false
name price gender true or false
... and so on
Here is the code I use to combine the excel and make it into one data frame:
import os
import pandas as pd
cwd = os.path.abspath('')
files = os.listdir(cwd)
df = pd.DataFrame()
for file in files:
if file.endswith('.XLSX'):
df = df.append(pd.read_excel(file), ignore_index=True)
print(df)
However, some of the excels are out of order like so (some of them have other format, the below one is just an example):
title3 title2 title1 title4
gender price name true or false
gender price name true or false
... and so on
So when combining using the code above, the result data frame will be messed up.
I wish the result to look like this despite the different excel formatting:
title1 title2 title3 title4
name price gender true or false
name price gender true or false
... and so on
Can someone help me fix the code? Thank you in advance.
Upvotes: 0
Views: 38
Reputation: 120539
Use pd.concat
to merge all your data (and pathlib
to get file names):
import pandas as pd
import pathlib
cwd = pathlib.Path.cwd()
files = cwd.glob('*.XLSX')
# desktop_dir = pathlib.Path(r'C:/users/xxx/Desktop')
# files = desktop_dir.glob('*.XLSX')
data = []
for file in files:
df = pd.read_excel(file)
data.append(df)
df = pd.concat(data)
>>> df
title3 title2 title1 title4
0 gender price name False
1 gender price name False
0 gender price name True
1 gender price name True
Upvotes: 1
Reputation: 1475
Instead df.append
try using df = pd.concat(df, pd.read_excel(file))
This will append the two dataframes one after the other taking columns into account
Upvotes: 0