Reputation: 337
I am trying to compare two csv files to look for common values in column 1.
import csv
f_d1 = open('test1.csv')
f_d2 = open('test2.csv')
f1_csv = csv.reader(f_d1)
f2_csv = csv.reader(f_d2)
for x,y in zip(f1_csv,f2_csv):
print(x,y)
I am trying to compare x[0] with y[0]. I am fairly new to python and trying to find the most pythonic way to achieve the results. Here is the csv files.
test1.csv
Hadrosaurus,1.2
Struthiomimus,0.92
Velociraptor,1.0
Triceratops,0.87
Euoplocephalus,1.6
Stegosaurus,1.4
Tyrannosaurus Rex,2.5
test2.csv
Euoplocephalus,1.87
Stegosaurus,1.9
Tyrannosaurus Rex,5.76
Hadrosaurus,1.4
Deinonychus,1.21
Struthiomimus,1.34
Velociraptor,2.72
Upvotes: 0
Views: 591
Reputation: 57033
Assuming that the files are not prohibitively large, you can read both of them with a CSV reader, convert the first columns to sets, and calculate the set intersection:
with open('test1.csv') as f:
set1 = set(x[0] for x in csv.reader(f))
with open('test2.csv') as f:
set2 = set(x[0] for x in csv.reader(f))
print(set1 & set2)
#{'Hadrosaurus', 'Euoplocephalus', 'Tyrannosaurus Rex', 'Struthiomimus',
# 'Velociraptor', 'Stegosaurus'}
Upvotes: 0
Reputation: 21
I believe you're looking for the set intersection:
import csv
f_d1 = open('test1.csv')
f_d2 = open('test2.csv')
f1_csv = csv.reader(f_d1)
f2_csv = csv.reader(f_d2)
x = set([item[0] for item in f1_csv])
y = set([item[0] for item in f2_csv])
print(x & y)
Upvotes: 1
Reputation: 13642
Take advantage of the defaultdict
in Python and you can iterate both the files and maintain the count in a dictionary like this
from collections import defaultdict
d = defaultdict(list)
for row in f1_csv:
d[row[0]].append(row[1])
for row in f2_csv:
d[row[0]].append(row[1])
d = {k: d[k] for k in d if len(d[k]) > 1}
print(d)
Output:
{'Hadrosaurus': ['1.2', '1.4'], 'Struthiomimus': ['0.92', '1.34'], 'Velociraptor': ['1.0', '2.72'],
'Euoplocephalus': ['1.6', '1.87'], 'Stegosaurus': ['1.4', '1.9'], 'Tyrannosaurus Rex': ['2.5', '5.76']}
Upvotes: 0
Reputation: 21
I added a line to test whether the numerical values in each row are the same. You can modify this to test whether, for instance, the values are within some distance of each other:
import csv
f_d1 = open('test1.csv')
f_d2 = open('test2.csv')
f1_csv = csv.reader(f_d1)
f2_csv = csv.reader(f_d2)
for x,y in zip(f1_csv,f2_csv):
if x[1] == y[1]:
print('they match!')
Upvotes: 0