shantanuo
shantanuo

Reputation: 32346

Rename column headings using regular expression

I have this excel file that has first 9 rows as heading.

df = pd.read_excel('https://testme162.s3.amazonaws.com/test_file.xlsx', header=list(range(9)))
df.columns = df.columns.map(lambda h: '{}_{}_{}_{}_{}_{}_{}_{}_{}'.format(h[0], h[1], h[2], h[3],h[4], h[5],h[6], h[7], h[8]))

I am trying to rename the column headings by removing unnecessary part i.e. _Unnamed: XXX

For e.g. the first 4 column headings should be:

'_index', '_type', '_id', '_score'

And not something like this...

'_index_Unnamed: 0_level_1_Unnamed: 0_level_2_Unnamed: 0_level_3_Unnamed: 0_level_4_Unnamed: 0_level_5_Unnamed: 0_level_6_Unnamed: 0_level_7_Unnamed: 0_level_8',
 '_type_Unnamed: 1_level_1_Unnamed: 1_level_2_Unnamed: 1_level_3_Unnamed: 1_level_4_Unnamed: 1_level_5_Unnamed: 1_level_6_Unnamed: 1_level_7_Unnamed: 1_level_8',
 '_id_Unnamed: 2_level_1_Unnamed: 2_level_2_Unnamed: 2_level_3_Unnamed: 2_level_4_Unnamed: 2_level_5_Unnamed: 2_level_6_Unnamed: 2_level_7_Unnamed: 2_level_8',
 '_score_Unnamed: 3_level_1_Unnamed: 3_level_2_Unnamed: 3_level_3_Unnamed: 3_level_4_Unnamed: 3_level_5_Unnamed: 3_level_6_Unnamed: 3_level_7_Unnamed: 3_level_8',
 '_source_eventVersion_Unnamed: 4_level_2_Unnamed: 4_level_3_Unnamed: 4_level_4_Unnamed: 4_level_5_Unnamed: 4_level_6_Unnamed: 4_level_7_Unnamed: 4_level_8',

How do I rename column headings using regular expression?

Upvotes: 2

Views: 89

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use MultiIndex.map with custom lambda function mapper which flattens the MultiIndex after removing unused levels:

mapper = lambda s: '_'.join(
    pd.Series(s).mask(lambda x: x.str.contains('Unnamed')).dropna())
    
df.columns = df.columns.map(mapper)

Result:

print(df.columns)

Index(['_index', '_type', '_id', '_score', '_source_eventVersion',
       '_source_userIdentity_type', '_source_userIdentity_principalId',
       ....
       '_source_sharedEventID', '_source_serviceEventDetails_snapshotId',
       '_source_errorCode', '_source_errorMessage'],
      dtype='object', length=163)

Upvotes: 1

Related Questions