Niuya
Niuya

Reputation: 424

How to rename pandas dataframe column name by checking columns's data

Example df would be:

   a         b  c  d   e
0  SN123456  3  5  7   SN123456
1  SN456123  4  6  8   SN456123

I am wondering how I can rename the column name from 'a' to 'Serial_Number' base on the data -- it starts with 'SN' and length is fix:8.

(we may not know the name of 'a' as it read from some csv file, also the position is not known)

Also how to remove duplicated column 'e', it's completely duplicated with column 'a'

Any idea on a faster way? Loop each column serial and get it's index and rename column's name is somehow not a good method. Thanks!

Upvotes: 0

Views: 165

Answers (1)

Brad Solomon
Brad Solomon

Reputation: 40918

Here's a rewrite in response to your comment. This will rename + drop in a vectorized fashion.

Given df:

>>> df
          a  b  c  d         e  f  g
0  SN123456  3  5  7  SN123456  0  0
1  SN456123  4  6  8  SN456123  0  0

Create 3 boolean masks of the same length as the columns:

>>> mask1 = df.dtypes == 'object'
>>> mask2 = df.iloc[0].str.len() == 8
>>> mask3 = df.iloc[0].str.startswith('SN')

Use these to identify which columns look like serial numbers. The first will be renamed; the rest will be dropped.

>>> rename, *drop = df.columns[mask1 & mask2 & mask3]

Then rename + drop:

>>> rename
'a'

>>> drop
['e']

>>> df.rename(columns={rename: 'Serial_Number'}).drop(drop, axis=1)
  Serial_Number  b  c  d  f  g
0      SN123456  3  5  7  0  0
1      SN456123  4  6  8  0  0

Upvotes: 1

Related Questions