Reputation:
I'm trying to make a function that will accept a list of filenames as parameter to access data from two files at a time and compare the values, if value matches it will be added to the set and then print set. The problem is that files have some matching values but function prints an empty set at the end.
def cross_reference(files):
set_of_users = set()
n = len(files)
files = cycle(files)
for index in range(n):
with open(next(files), mode='r') as read_file:
with open(next(files), mode='r') as read_file1:
for contact in read_file:
for contact1 in read_file1:
if contact == contact1:
set_of_users.add(contact)
break
print(set_of_users)
The files having values are:
file1.txt
:
0709-12345
0724-87234
0723-67890
0721-16273
file2.txt
:
0709-87263
0743-76346
0724-87234
0777-89264
file3.txt
:
0724-87234
0743-87469
0709-12398
0709-78548
0724-87234
is common in all files but is not added in the set.
Upvotes: 0
Views: 98
Reputation: 189689
Your code seems unnecessarily complex. Perhaps you have reasons to make it so complicated, but without those reasons, it would seem to make more sense simply something like
from collections import defaultdict
import glob
def cross_reference(files):
seen = defaultdict(set)
for file in files:
with open(file) as lines:
for line in lines:
seen[line.rstrip('\n')].add(file)
for item in seen.keys():
if seen[item] == set(files):
print(item)
cross_reference(glob.glob('file[1-3].txt'))
Upvotes: 0
Reputation: 1996
One reason might be that you are not stripping the contents of each line. So one line could be '0724-87234\n'
and the other could be '0724-87234'
without '\n'
.
I would go for another approach like this:
def cross_reference(files):
common_values = None
for file in files:
with open(file) as f:
values = set([line.rstrip() for line in f.readlines()])
if common_values is None:
common_values = values
else:
common_values = common_values & values
return common_values
Instead of looping over the range of the length of files, you can simply loop over the files themselves. Then you open the file, read all the lines into a list, then apply the rstip() method to each of the lines to get rid of '\n'
. This is done with a list comprehension. Then you transform this list to a set. With sets, you can easily get the common values of two sets by doing set1 & set2
.
Here, I set common_values
to None in the beginning. In the loop, I check if it is None and if so, I assign it to the set of the first file. For the next files, I apply the &
operation between the set of the new file and the common_values
set. So after each new file, common_values will only contain the lines which are present in all files checked upto that point.
Upvotes: 1
Reputation: 12140
for contact1 in read_file1:
consumes the read_file1
on the first iteration of for contact in read_file:
so all the consequent iterations don't go through the second file. You can preload lines once before iterations with lines1 = read_file1.readlines()
and iterate over them:
def cross_reference(files):
set_of_users = set()
n = len(files)
files = cycle(files)
for index in range(n):
with open(next(files), mode='r') as read_file:
lines1 = read_file.readlines()
with open(next(files), mode='r') as read_file1:
lines2 = read_file2.readlines()
for contact in lines1:
for contact1 in lines2:
if contact == contact1:
set_of_users.add(contact)
break
print(set_of_users)
Upvotes: 0