Michal K
Michal K

Reputation: 255

Manipulating csv files with Python

Im trying to output the difference between 2 csv files by two columns and create a third csv file. How can I make the following code compare by columns 0 and 3.

import csv

f1 = open ("ted.csv")
oldFile1 = csv.reader(f1, delimiter=',')
oldList1 = list(oldFile1)

f2 = open ("ted2.csv")
newFile2 = csv.reader(f2, delimiter=',')
newList2 = list(newFile2)

f1.close()
f2.close()

output1 = set(tuple(row) for row in newList2 if row not in oldList1)
output2 = set(tuple(row) for row in oldList1 if row not in newList2)

with open('Michal_K.csv','w') as csvfile:
      wr = csv.writer(csvfile,delimiter=',')
      for line in (output2).difference(output1):
          wr.writerow(line)   

Upvotes: 3

Views: 709

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

If you want the rows from ted.csv that do not have any of the same third and fourth column elements as ted2, create a set of those elements from the ted2 and check each row from ted.csv before writing:

with open("ted.csv") as f1, open("ted2.csv") as f2, open('foo.csv', 'w') as out:
    r1, r2 = csv.reader(f1), csv.reader(f2)
    st = set((row[0], row[3]) for row in r1)
    wr = csv.writer(out)
    for row in (row for row in r2 if (row[0],row[3]) not in st):
          wr.writerow(row)   

If you actually want something like the symmetric difference where you get unique rows from both then make a set of each third and fourth columns from both files :

from itertools import chain

with open("ted.csv") as f1, open("ted2.csv") as f2, open('foo.csv', 'w') as out:
    r1, r2 = csv.reader(f1), csv.reader(f2)
    st1 = set((row[0], row[3]) for row in r1)
    st2 = set((row[0], row[3]) for row in r2)
    f1.seek(0), f2.seek(0)
    wr = csv.writer(out)
    r1, r2 = csv.reader(f1), csv.reader(f2)
    output1 = (row for row in r1 if (row[0], row[3]) not in st2)
    output2 = (row for row in r2 if (row[0], row[3]) not in st1)
    for row in chain.from_iterable((output1, output2)):
        wr.writerow(row)

Upvotes: 2

Related Questions