zhangruibo101
zhangruibo101

Reputation: 67

How to use pandas to import and reformat multiple excels?

So I tried combining multiple excel (around 30 different ones) into one single excel using python. Most of the excels looks something like this:

title1 title2 title3  title4
name   price  gender  true or false
name   price  gender  true or false
... and so on 

Here is the code I use to combine the excel and make it into one data frame:

    import os
    import pandas as pd
    cwd = os.path.abspath('') 
    files = os.listdir(cwd)  
    df = pd.DataFrame()
    for file in files:
        if file.endswith('.XLSX'):
            df = df.append(pd.read_excel(file), ignore_index=True)
    print(df) 

However, some of the excels are out of order like so (some of them have other format, the below one is just an example):

title3 title2 title1  title4
gender price  name    true or false
gender price  name    true or false
... and so on

So when combining using the code above, the result data frame will be messed up.

I wish the result to look like this despite the different excel formatting:

title1 title2 title3  title4
name   price  gender  true or false
name   price  gender  true or false
... and so on 

Can someone help me fix the code? Thank you in advance.

Upvotes: 0

Views: 38

Answers (2)

Corralien
Corralien

Reputation: 120539

Use pd.concat to merge all your data (and pathlib to get file names):

import pandas as pd
import pathlib

cwd = pathlib.Path.cwd()
files = cwd.glob('*.XLSX')

# desktop_dir = pathlib.Path(r'C:/users/xxx/Desktop')
# files = desktop_dir.glob('*.XLSX')

data = []
for file in files:
    df = pd.read_excel(file)
    data.append(df)

df = pd.concat(data)
>>> df
   title3 title2 title1  title4
0  gender  price   name   False
1  gender  price   name   False
0  gender  price   name    True
1  gender  price   name    True

Upvotes: 1

Ananay Mital
Ananay Mital

Reputation: 1475

Instead df.append try using df = pd.concat(df, pd.read_excel(file))

This will append the two dataframes one after the other taking columns into account

Upvotes: 0

Related Questions