Reputation: 185
I am really new to python and I have two csv file. The first one (more.csv) has content of
A123,B456,C789
The second one (less.csv) has content of
B456
I want so that when they are the same they store the similar item into a variable called "same"
I figure it would start with something like:
more = open('more.csv','r')
less= open('less.csv','r')
for item in unitid:
Thank you.
Upvotes: 1
Views: 668
Reputation: 4022
If they're only one line, you can use the set object (a python built-in) to compare them, for instance:
>>> a = ['A123','B456','C789','D007']
>>> b = ['B456','D007','E009']
>>> c = list(set(a).intersection(b))
>>> print c
['B456', 'D007']
The full method to compare from files would look like:
def compare( fileA, fileB ):
a_file = open(fileA, 'r')
a_data = a_file.read()
a_file.close()
b_file = open(fileB, 'r')
b_data = b_file.read()
b_file.close()
# compare the contents
a_set = set(a_data.split(','))
b_set = set(b_data.split(','))
return list(a_set.intersection(b_set))
compare('more.csv', 'less.csv')
If they are more than a single line per file, then you'd still be able to use this, you'd just have to modify it a bit - I guess storing the intersections into an array that represents the line-by-line similarities or something.
Upvotes: 2
Reputation: 50185
Once you process your CSV files into lists you can use collections.Counter
to find duplicates:
from collections import Counter
# after processing your CSV files into two lists:
more_list = ['A123', 'B456', 'C789', 'D007']
less_list = ['B456', 'D007', 'E009']
dupe_counter = Counter(more_list)
dupe_counter.update(less_list)
same_list = [val for val in dupe_counter if dupe_counter[val] > 1]
# same_list will be: ['B456', 'D007']
Upvotes: 1