M PAUL
M PAUL

Reputation: 1248

Manipulating a CSV file in Python

import csv

reader=csv.reader(open('Names_Duplicates.csv', 'r'),delimiter=',')
writer=csv.writer(open('Names_NoDuplicates.csv', 'w'),delimiter=',')

Names=set()
for row in reader:
    if row[0] not in Names:
        writer.writerow(row)
        Names.add(row[0])

I am using this code to remove duplicates from a CSV file using Python 2.7(Windows). I am able to remove duplicates based on one column at a time. Is there anyway i can remove duplicates from multiple coloumn's at the same time?

Any help is appreciated.

P.S -- Pandas library is not working in my system.

Upvotes: 0

Views: 96

Answers (1)

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 799450

Use a tuple of multiple items as the key.

import operator
 ...
fieldmatches = set()
fieldspec = operator.itemgetter(0, 2, 3) # for example
for row in reader:
  if fieldspec(row) not in fieldmatches:
    writer.writerow(row)
    fieldmatches.add(fieldspec(row))

Upvotes: 2

Related Questions