Concatenate all sheet files and create one new column with sheet names in Python

Question

Given a excel file test.xlsx which has many sheets: 1.DC, 2.SH and 3.GZ, etc.

import xlrd
dfs = xlrd.open_workbook('./test.xlsx', on_demand = True)
print(dfs.sheet_names())

Out:

['1.DC', '2.SH', '3.GZ']

How could I read and concatenate all the sheets and create a new column with sheet names while removing 1., 2., 3., etc. in the starting part of each sheet names?

The expected result will like this:

jezrael · Accepted Answer

Use read_excel with sheet_name=None for convert all sheets to DataFrames:

dfs = pd.read_excel('test.xlsx', sheet_name=None)

dfs = {k: v.loc[:, ~v.columns.str.contains('Unnamed')] for k, v in dfs.items()}

# print(dfs)

Then if need one column use lsit comparehension with select this column, here name

df = pd.DataFrame([(x, k) for k, v in dfs.items() for x in v['name']], 
                  columns=['name','city'])
df['city'] = df['city'].str.replace('^[0-9\.]+', '')
print (df)
      name    city
0    James      DC
1     Bond      DC
2   Steven      DC
3   Walker      SH
4      Tom      SH
5   Filler      GZ
6   Cooker      GZ
7      Tim      GZ

Or if need all columns omit [name] here:

dfs = pd.read_excel('test.xlsx', sheet_name=None)
print(dfs) 

df = (pd.concat(dfs)
        .reset_index(level=1, drop=True)
        .rename_axis('city')['name']
        .reset_index())
df['city'] = df['city'].str.replace('^[0-9\.]+', '')
print (df)
      city    name
0       DC   James
1       DC    Bond
2       DC  Steven
3       SH  Walker
4       SH     Tom
5       GZ  Filler
6       GZ  Cooker
7       GZ     Tim

Concatenate all sheet files and create one new column with sheet names in Python

Answers (1)

Related Questions