Reputation: 988
I am trying to remove duplicate rows but with some special condition. Input data is as below: I want to remove duplicate rows. I have to compare all the columns and hence I have to sort the dataframe based on CODE and hence I can compare the rows. If cell values are same (Have to ignore the upper and lower case while comparing the column CITY) and if the COMPANY column has both BELL and MAS, then I have to choose the with BELL. Thanks.
NAME CODE STATE CITY COMPANY
abc 109 TN Trichy BELL
abc 109 TN Salem MAS
pen 9 TN Mdu BELL
pen 9 TN MDU MAS
hat 10 TN Karur MAS
mat 89 TN Hosur MAS
mat 89 TN Hosur BELL
paper 77 TN Nyl BELL
stone 98 TN Nyl BELL
stone 98 TN Nyl MAS
Results expected:
NAME CODE STATE CITY COMPANY
abc 109 TN Trichy BELL
abc 109 TN Salem MAS
pen 9 TN Mdu BELL
hat 10 TN Karur MAS
mat 89 TN Hosur BELL
paper 77 TN Nyl BELL
stone 98 TN Nyl BELL
Upvotes: 0
Views: 42
Reputation: 869
You can try the following steps:
Converting the CITY column to lowercase
Sorting the data according to COMPANY (considering that you need to keep BELL one)
removing duplicates based on other columns excluding COMPANY
data.CITY =data.CITY.str.lower()
data.sort_values(by='COMPANY',inplace=True)
data.drop_duplicates(subset=['NAME','CODE','STATE','CITY'],keep="first")
Hope it helps.
Upvotes: 2