Reputation: 3677
I have a df that looks like this:
col1_test col1_test.1
abc NaN
How do I drop only the .1
while keeping all the other characters in the column name?
current code to drop .1
:
df.columns = df.columns.str.extract(r'\.?', expand=False)
but this is dropping the other characters in the column name like underscore.
New df:
col1_test col1_test
abc NaN
Once this part is set, I will merge the columns using this:
df = df.groupby(level=0, axis=1).first()
Upvotes: 0
Views: 577
Reputation: 402902
This is not recommended because it becomes difficult to index specific columns when there are duplicate headers.
A better solution, however, since trying to perform a groupby
, would be to pass a callable.
df
col1_test col1_test.1
0 abc NaN
df.groupby(by=lambda x: x.rsplit('.', 1)[0], axis=1).first()
col1_test
0 abc
For reference, you'd remove column suffixes with str.replace
:
df.columns = df.columns.str.replace(r'\.\d+$', '')
You can also use str.rsplit
:
df.columns = df.columns.str.rsplit('.', 1).str[0]
df
col1_test col1_test
0 abc NaN
Upvotes: 2