Reputation: 32346
I have this excel file that has first 9 rows as heading.
df = pd.read_excel('https://testme162.s3.amazonaws.com/test_file.xlsx', header=list(range(9)))
df.columns = df.columns.map(lambda h: '{}_{}_{}_{}_{}_{}_{}_{}_{}'.format(h[0], h[1], h[2], h[3],h[4], h[5],h[6], h[7], h[8]))
I am trying to rename the column headings by removing unnecessary part i.e. _Unnamed: XXX
For e.g. the first 4 column headings should be:
'_index', '_type', '_id', '_score'
And not something like this...
'_index_Unnamed: 0_level_1_Unnamed: 0_level_2_Unnamed: 0_level_3_Unnamed: 0_level_4_Unnamed: 0_level_5_Unnamed: 0_level_6_Unnamed: 0_level_7_Unnamed: 0_level_8',
'_type_Unnamed: 1_level_1_Unnamed: 1_level_2_Unnamed: 1_level_3_Unnamed: 1_level_4_Unnamed: 1_level_5_Unnamed: 1_level_6_Unnamed: 1_level_7_Unnamed: 1_level_8',
'_id_Unnamed: 2_level_1_Unnamed: 2_level_2_Unnamed: 2_level_3_Unnamed: 2_level_4_Unnamed: 2_level_5_Unnamed: 2_level_6_Unnamed: 2_level_7_Unnamed: 2_level_8',
'_score_Unnamed: 3_level_1_Unnamed: 3_level_2_Unnamed: 3_level_3_Unnamed: 3_level_4_Unnamed: 3_level_5_Unnamed: 3_level_6_Unnamed: 3_level_7_Unnamed: 3_level_8',
'_source_eventVersion_Unnamed: 4_level_2_Unnamed: 4_level_3_Unnamed: 4_level_4_Unnamed: 4_level_5_Unnamed: 4_level_6_Unnamed: 4_level_7_Unnamed: 4_level_8',
How do I rename column headings using regular expression?
Upvotes: 2
Views: 89
Reputation: 71689
Use MultiIndex.map
with custom lambda function mapper
which flattens the MultiIndex
after removing unused levels:
mapper = lambda s: '_'.join(
pd.Series(s).mask(lambda x: x.str.contains('Unnamed')).dropna())
df.columns = df.columns.map(mapper)
Result:
print(df.columns)
Index(['_index', '_type', '_id', '_score', '_source_eventVersion',
'_source_userIdentity_type', '_source_userIdentity_principalId',
....
'_source_sharedEventID', '_source_serviceEventDetails_snapshotId',
'_source_errorCode', '_source_errorMessage'],
dtype='object', length=163)
Upvotes: 1