Reputation: 10031
Given a excel file test.xlsx which has many sheets: 1.DC
, 2.SH
and 3.GZ
, etc.
import xlrd
dfs = xlrd.open_workbook('./test.xlsx', on_demand = True)
print(dfs.sheet_names())
Out:
['1.DC', '2.SH', '3.GZ']
How could I read and concatenate all the sheets and create a new column with sheet names while removing 1., 2., 3., etc.
in the starting part of each sheet names?
The expected result will like this:
Upvotes: 1
Views: 838
Reputation: 862771
Use read_excel
with sheet_name=None
for convert all sheets to DataFrame
s:
dfs = pd.read_excel('test.xlsx', sheet_name=None)
dfs = {k: v.loc[:, ~v.columns.str.contains('Unnamed')] for k, v in dfs.items()}
# print(dfs)
Then if need one column use lsit comparehension with select this column, here name
df = pd.DataFrame([(x, k) for k, v in dfs.items() for x in v['name']],
columns=['name','city'])
df['city'] = df['city'].str.replace('^[0-9\.]+', '')
print (df)
name city
0 James DC
1 Bond DC
2 Steven DC
3 Walker SH
4 Tom SH
5 Filler GZ
6 Cooker GZ
7 Tim GZ
Or if need all columns omit [name]
here:
dfs = pd.read_excel('test.xlsx', sheet_name=None)
print(dfs)
df = (pd.concat(dfs)
.reset_index(level=1, drop=True)
.rename_axis('city')['name']
.reset_index())
df['city'] = df['city'].str.replace('^[0-9\.]+', '')
print (df)
city name
0 DC James
1 DC Bond
2 DC Steven
3 SH Walker
4 SH Tom
5 GZ Filler
6 GZ Cooker
7 GZ Tim
Upvotes: 1