boyenec
boyenec

Reputation: 1617

Pandas how to compare two csv file for delete duplicate?

Assume I have two csv file csv1 and csv2. Now I will to delete all record from csv2 if any record match with csv1. Both csv have unique identifier sku.

csv1:

sku    name 
Gk125  Jhone
GK126  Mike

csv2:

sku    name 
   Gk127  Doe
   GK128  Hock
   GK126  Mike #this is the duplicate record which already in csv1 

my expected result for csv2 will be

  sku    name 
   Gk127  Doe
   GK128  Hock

I tried this but didn't work:

old_file = list(old['sku'])
updated = new[~new['sku'].isin(old)]
updated.to_csv('...my path/updated.csv')
        

Upvotes: 2

Views: 1405

Answers (1)

Muhammad Hassan
Muhammad Hassan

Reputation: 4229

Works fine for me:

df1 = pd.DataFrame(data={'sku':['Gk125', 'GK126'], 'name':['Jhone', 'Mike']})
df2 = pd.DataFrame(data={'sku':['Gk127', 'GK128', 'GK126'], 'name':['Doe', 'Hock', 'Mike']})

print(df2[~df2['sku'].isin(df1['sku'])])

Output:

     sku  name
0  Gk127   Doe
1  GK128  Hock

Upvotes: 4

Related Questions