Reputation: 170
I am trying to search filetwos contents and see if it contains any duplicates of the given search term(line from fileone). If it contains a duplicate it will do nothing but if it contains no duplicates I want it to append a line.
fileone.txt (two lines)
[('123', 'aaa')]
[('900', 'abc')]
filetwo.txt
[('123', 'aaa')]
[('999', 'zzz')]
My code below adds the lines to filetwo even if they are duplicates. I cannot figure this out!
with open('fileone.txt', 'r') as f:
seen = open('filetwo.txt', 'a+')
for line in f:
if line in seen:
print(line + 'is a duplicate')
else:
seen.write(line)
f.close()
seen.close()
Upvotes: 1
Views: 144
Reputation: 365807
You can't just do if line in seen:
to search the whole seen
file for the given line. Even if you could, it would only search the rest of the file, and since you're at the end of the file, that would mean you're searching over nothing. And, even if you solved that problem, it would still require doing a linear search over the whole file for each line, which would be very slow.
The simplest thing to do is to keep track of all the lines seen, e.g., with a set
:
with open('filetwo.txt') as f:
seen = set(f)
with open('fileone.txt') as fin, open('filetwo.txt', 'a+') as fout:
for line in fin:
if line in seen:
print(line + 'is a duplicate')
else:
fout.write(line)
seen.add(line)
Notice that I'm pre-filling seen
with all of the lines in filetwo.txt
before we start, and then adding each new line to seen
as we go along. That avoids having to re-read filetwo.txt
over and over again—we know what we're writing to it, so just remember it.
Upvotes: 2