Reputation: 1285
I am trying to remove duplicate columns values in my dataframe.
My code is as below
xls = pd.ExcelFile('Base File.xlsx');
mapping_df = xls.parse('Mapping');
engagement_data_df = xls.parse('Detail Report');
engagement_data_df =engagement_data_df.loc[:,~engagement_data_df.columns.duplicated()]
I have 2 duplicate columns called 'BCS Attached Flag'. I tried to deduplicate the columns with the above code but no luck. Can I ask what I am doing wrong?
Adrian
Edit: It seems that the duplicate column appends an attached .1 behind but in the csv file both the columns BCS Attached Flags are there . I did a print(engagement_data_df.head(10))
Division Region BCS Attached Flag BCSAttached Flag.1
China China A Y Y
Singapore Singapore B Y Y
Upvotes: 0
Views: 57
Reputation: 862751
I think you need first extract text only and then call duplicated
:
m = ~engagement_data_df.columns.str.extract('([a-zA-Z]+)', expand=False).duplicated()
engagement_data_df = engagement_data_df.loc[:, m]
Upvotes: 1