Reputation: 676
The closest answer is the link below. how to delete a duplicate column read from excel in pandas
My requirement is slightly different. I have more than 100 columns and the column names can have '.'(dot) in them.Sample dataframe is as follows:-
df = pd.DataFrame(columns=['A', 'B', 'C','A','D. s'])
So I cannot truncate based on '.1' or '.2'
Also when I read from excel, the column names are read as A, A.1 and A.2 and so on so even the following command won't work.
df = df.loc[:,~df.columns.duplicated()]
I want to drop A, A.1 and retain A.2
Please suggest the way forward.
Upvotes: 4
Views: 1654
Reputation: 75100
IIUC , you can first remove the numbers after the dot and then keep only the last duplicates:
df.loc[:,~df.columns.str.replace('\.\d+','').duplicated(keep='last')]
Upvotes: 3
Reputation: 34086
Something like this could work:
In [552]: df
Out[552]:
A A.1 A.2 A.3 B B.1 B.3 B.s
0 1 2 3 4 1 2 1 2
1 2 3 4 5 2 4 4 3
In [554]: df.filter(regex=r'^(?!\w\.[0-9])')
Out[554]:
A B B.s
0 1 1 2
1 2 2 3
Upvotes: 1