Dance Party2
Dance Party2

Reputation: 7536

pandas Combine Excel Spreadsheets

I have an Excel workbook with many tabs. Each tab has the same set of headers as all others. I want to combine all of the data from each tab into one data frame (without repeating the headers for each tab).

So far, I've tried:

import pandas as pd
xl = pd.ExcelFile('file.xlsx')
df = xl.parse()

Can use something for the parse argument that will mean "all spreadsheets"? Or is this the wrong approach?

Thanks in advance!

Update: I tried:

a=xl.sheet_names
b = pd.DataFrame()
for i in a:
    b.append(xl.parse(i))
b

But it's not "working".

Upvotes: 11

Views: 17452

Answers (2)

daedalus
daedalus

Reputation: 10923

This is one way to do it -- load all sheets into a dictionary of dataframes and then concatenate all the values in the dictionary into one dataframe.

import pandas as pd

Set sheetname to None in order to load all sheets into a dict of dataframes and ignore index to avoid overlapping values later (see comment by @bunji)

df = pd.read_excel('tmp.xlsx', sheet_name=None, index_col=None)

Then concatenate all dataframes

cdf = pd.concat(df.values())

print(cdf)

Upvotes: 27

sivamani
sivamani

Reputation: 1

import pandas as pd  

f = 'file.xlsx'
df = pd.read_excel(f, sheet_name=None, ignore_index=True) 
df2 = pd.concat(df, sort=True)

df2.to_excel('merged.xlsx', 
             engine='xlsxwriter', 
             sheet_name=Merged,
             header = True,
             index=False)

Upvotes: 0

Related Questions