Reputation: 255
Im trying to output the difference between 2 csv files by two columns and create a third csv file. How can I make the following code compare by columns 0 and 3.
import csv
f1 = open ("ted.csv")
oldFile1 = csv.reader(f1, delimiter=',')
oldList1 = list(oldFile1)
f2 = open ("ted2.csv")
newFile2 = csv.reader(f2, delimiter=',')
newList2 = list(newFile2)
f1.close()
f2.close()
output1 = set(tuple(row) for row in newList2 if row not in oldList1)
output2 = set(tuple(row) for row in oldList1 if row not in newList2)
with open('Michal_K.csv','w') as csvfile:
wr = csv.writer(csvfile,delimiter=',')
for line in (output2).difference(output1):
wr.writerow(line)
Upvotes: 3
Views: 709
Reputation: 180391
If you want the rows from ted.csv that do not have any of the same third and fourth column elements as ted2, create a set of those elements from the ted2 and check each row from ted.csv before writing:
with open("ted.csv") as f1, open("ted2.csv") as f2, open('foo.csv', 'w') as out:
r1, r2 = csv.reader(f1), csv.reader(f2)
st = set((row[0], row[3]) for row in r1)
wr = csv.writer(out)
for row in (row for row in r2 if (row[0],row[3]) not in st):
wr.writerow(row)
If you actually want something like the symmetric difference
where you get unique rows from both then make a set of each third and fourth columns from both files :
from itertools import chain
with open("ted.csv") as f1, open("ted2.csv") as f2, open('foo.csv', 'w') as out:
r1, r2 = csv.reader(f1), csv.reader(f2)
st1 = set((row[0], row[3]) for row in r1)
st2 = set((row[0], row[3]) for row in r2)
f1.seek(0), f2.seek(0)
wr = csv.writer(out)
r1, r2 = csv.reader(f1), csv.reader(f2)
output1 = (row for row in r1 if (row[0], row[3]) not in st2)
output2 = (row for row in r2 if (row[0], row[3]) not in st1)
for row in chain.from_iterable((output1, output2)):
wr.writerow(row)
Upvotes: 2