Abhishek Kulkarni
Abhishek Kulkarni

Reputation: 676

remove duplicate columns from pandas read excel dataframe

The closest answer is the link below. how to delete a duplicate column read from excel in pandas

My requirement is slightly different. I have more than 100 columns and the column names can have '.'(dot) in them.Sample dataframe is as follows:-

df = pd.DataFrame(columns=['A', 'B', 'C','A','D. s'])

So I cannot truncate based on '.1' or '.2'

Also when I read from excel, the column names are read as A, A.1 and A.2 and so on so even the following command won't work.

df = df.loc[:,~df.columns.duplicated()]

I want to drop A, A.1 and retain A.2

Please suggest the way forward.

Upvotes: 4

Views: 1654

Answers (2)

anky
anky

Reputation: 75100

IIUC , you can first remove the numbers after the dot and then keep only the last duplicates:

df.loc[:,~df.columns.str.replace('\.\d+','').duplicated(keep='last')]

Upvotes: 3

Mayank Porwal
Mayank Porwal

Reputation: 34086

Something like this could work:

In [552]: df                                                                                                                                                                                                
Out[552]: 
   A  A.1  A.2  A.3  B  B.1  B.3  B.s
0  1    2    3    4  1    2    1    2
1  2    3    4    5  2    4    4    3

In [554]: df.filter(regex=r'^(?!\w\.[0-9])')                                                                                                                                                                
Out[554]: 
   A  B  B.s
0  1  1    2
1  2  2    3

Upvotes: 1

Related Questions