user15240243
user15240243

Reputation: 27

How can I concatenate multiple excel sheets in a single file to one single file with an extra column that contains original sheet name?

I have an excel file with over one hundred sheets. Each sheet has the same columns with information. I have concatenated all the sheets in one single file.

Script

single_file = pd.concat(pd.read_excel('multiple_sheets.xlsx',sheet_name=None),ignore_index=True)
single_file.to_csv('single_file.csv')

The script works fine. The thing is I would like to add a column that identifies each row of information with the name of its original sheet. Is that possible?

Thanks in advance, Fernando

Upvotes: 0

Views: 539

Answers (2)

user2736738
user2736738

Reputation: 30926

single_file = pd.read_excel('multiple_sheets.xlsx',sheet_name=None)
single_file = pd.concat([sheet.assign(identifier=i) for i,sheet in single_file.items()])

Here we are using the idea that all sheets can be accessed by iterating (using items()) over the dictionary of dataframes.

Upvotes: 1

Stephen Mason
Stephen Mason

Reputation: 357

You'll want to save the pd.concat for the very end of the script.

You've read in the file with sheetname = None which creates a dictionary of sheet names as the key, and the dataframe as the value. Loop over each sheet and add your label column first, then concatenate.

Upvotes: 1

Related Questions