RustyShackleford
RustyShackleford

Reputation: 3677

How to normalize column names with '.1' in the column name and not drop any other characters?

I have a df that looks like this:

col1_test   col1_test.1
abc          NaN

How do I drop only the .1 while keeping all the other characters in the column name?

current code to drop .1:

df.columns = df.columns.str.extract(r'\.?', expand=False)

but this is dropping the other characters in the column name like underscore.

New df:

col1_test   col1_test
abc          NaN

Once this part is set, I will merge the columns using this:

df = df.groupby(level=0, axis=1).first()

Upvotes: 0

Views: 577

Answers (1)

cs95
cs95

Reputation: 402902

This is not recommended because it becomes difficult to index specific columns when there are duplicate headers.

A better solution, however, since trying to perform a groupby, would be to pass a callable.

df
  col1_test  col1_test.1
0       abc          NaN

df.groupby(by=lambda x: x.rsplit('.', 1)[0], axis=1).first()

  col1_test
0       abc

For reference, you'd remove column suffixes with str.replace:

df.columns = df.columns.str.replace(r'\.\d+$', '') 

You can also use str.rsplit:

df.columns = df.columns.str.rsplit('.', 1).str[0]
df
   col1_test  col1_test
0        abc        NaN

Upvotes: 2

Related Questions