Reputation: 75
I have two csv files and both have a common column named 'Name'. How can I compare the two and write a new file without the duplicate values?
Code I have so far:
import pandas as pd
import csv
df1 = pd.read_csv('Stale1.csv')
df2 = pd.read_csv('Stale2.csv')
list1=df1['Name'].tolist()
list2=df2['Name'].tolist()
Example of the columns:
Name
aaa
bbb
ccc
ddd
Name
aaa
vvv
hhh
I would like to write a new file that does not contain 'aaa' as it is the duplicate. Desired output:
Name
bbb
ccc
ddd
vvv
hhh
Upvotes: 0
Views: 45
Reputation: 9806
You can concatenate the two dataframes and then use the drop_duplicates method:
df = pd.concat([df1, df2]).drop_duplicates(subset='Name', keep=False, ignore_index=True)
where subset='Name'
tells to consider only the column 'Name'
for identifying duplicates and keep=False
tells to drop all the duplicates.
To get the final result use:
df['Name'].tolist()
Upvotes: 1