sargupta
sargupta

Reputation: 1033

Removing unwanted characters and editing Column names in pandas

I have pandas df with certain column names. The column names are as below,

u'Kanta/City', u'Aluepaso/Regional Level', u'Akue/District', u'Seotukartakudi/Map code', u'k�/Age', u'2015', u'2016', u'2017', u'2018'.

What I would like to do is, rename the columns in one line of code as below,

'City', 'Regional_Level', 'District', 'Map_Code', 'Age', '2015', '2016', '2017', '2018'.

Is there any efficient way of doing so (with lambda function)?

Upvotes: 4

Views: 3807

Answers (2)

Karn Kumar
Karn Kumar

Reputation: 8816

Simplest will be with using replace using regex.

>>> df
Empty DataFrame
Columns: [Kanta/City, Aluepaso/Regional Level, Akue/District, Seotukartakudi/Map code, k�/Age, 2015, 2016, 2017, 2018]
Index: []

>>> df.columns.str.replace('.*[\\\/]', '')
Index(['City', 'Regional Level', 'District', 'Map code', 'Age', '2015', '2016',
       '2017', '2018'],
      dtype='object')

Regex explanation:

.* matches any character (except for line terminators)

* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)

Match a single character present in the list below [\\\/]

\\ matches the character \ literally (case sensitive)

\/ matches the character / literally (case sensitive)

Upvotes: 0

Mohit Motwani
Mohit Motwani

Reputation: 4792

Using lambda:

df.rename(columns=lambda x: x.split('/')[1].replace(' ','_') if '/' in x else x, inplace= True)

df.columns
> Index(['City', 'Regional_Level', 'District', 'Map_code', 'Age', '2015', '2016',
           '2017', '2018'],
          dtype='object')

Upvotes: 4

Related Questions