Wonton_User
Wonton_User

Reputation: 170

Seeing if one line from file is a duplicate in another file Python

I am trying to search filetwos contents and see if it contains any duplicates of the given search term(line from fileone). If it contains a duplicate it will do nothing but if it contains no duplicates I want it to append a line.

fileone.txt (two lines)

[('123', 'aaa')]

[('900', 'abc')]

filetwo.txt

[('123', 'aaa')]

[('999', 'zzz')]

My code below adds the lines to filetwo even if they are duplicates. I cannot figure this out!

with open('fileone.txt', 'r') as f:
seen = open('filetwo.txt', 'a+')
for line in f:
    if line in seen:
        print(line + 'is a duplicate')
    else:
        seen.write(line)

f.close()
seen.close()

Upvotes: 1

Views: 144

Answers (1)

abarnert
abarnert

Reputation: 365807

You can't just do if line in seen: to search the whole seen file for the given line. Even if you could, it would only search the rest of the file, and since you're at the end of the file, that would mean you're searching over nothing. And, even if you solved that problem, it would still require doing a linear search over the whole file for each line, which would be very slow.

The simplest thing to do is to keep track of all the lines seen, e.g., with a set:

with open('filetwo.txt') as f:
    seen = set(f)

with open('fileone.txt') as fin, open('filetwo.txt', 'a+') as fout:
    for line in fin:
        if line in seen:
            print(line + 'is a duplicate')
        else:
            fout.write(line)
            seen.add(line)

Notice that I'm pre-filling seen with all of the lines in filetwo.txt before we start, and then adding each new line to seen as we go along. That avoids having to re-read filetwo.txt over and over again—we know what we're writing to it, so just remember it.

Upvotes: 2

Related Questions