Reputation: 1248
import csv
reader=csv.reader(open('Names_Duplicates.csv', 'r'),delimiter=',')
writer=csv.writer(open('Names_NoDuplicates.csv', 'w'),delimiter=',')
Names=set()
for row in reader:
if row[0] not in Names:
writer.writerow(row)
Names.add(row[0])
I am using this code to remove duplicates from a CSV file using Python 2.7(Windows). I am able to remove duplicates based on one column at a time. Is there anyway i can remove duplicates from multiple coloumn's at the same time?
Any help is appreciated.
P.S -- Pandas library is not working in my system.
Upvotes: 0
Views: 96
Reputation: 799450
Use a tuple of multiple items as the key.
import operator
...
fieldmatches = set()
fieldspec = operator.itemgetter(0, 2, 3) # for example
for row in reader:
if fieldspec(row) not in fieldmatches:
writer.writerow(row)
fieldmatches.add(fieldspec(row))
Upvotes: 2