remove duplicate columns from pandas read excel dataframe

Question

My requirement is slightly different. I have more than 100 columns and the column names can have '.'(dot) in them.Sample dataframe is as follows:-

df = pd.DataFrame(columns=['A', 'B', 'C','A','D. s'])

So I cannot truncate based on '.1' or '.2'

Also when I read from excel, the column names are read as A, A.1 and A.2 and so on so even the following command won't work.

df = df.loc[:,~df.columns.duplicated()]

I want to drop A, A.1 and retain A.2

Please suggest the way forward.

anky · Accepted Answer

IIUC , you can first remove the numbers after the dot and then keep only the last duplicates:

df.loc[:,~df.columns.str.replace('\.\d+','').duplicated(keep='last')]

Answers (2)