Search for strings listed in one file from another text file?

Question

I want to find strings listed in list.txt (one string per line) in another text file in case I found it print 'string,one_sentence' in case didn't find 'string,another_sentence'. I'm using following code, but it is finding only last string in the strings list from file list.txt. Cannot understand what could be the reason?

data = open('c:/tmp/textfile.TXT').read()
for x in open('c:/tmp/list.txt').readlines():
    if x in data:
        print(x,',one_sentence')
    else:
        print(x,',another_sentence')

Dr. Jan-Philip Gehrcke · Accepted Answer

When you read a file with readlines(), the resulting list elements do have a trailing newline characters. Likely, these are the reason why you have less matches than you expected.

Instead of writing

for x in list:

write

for x in (s.strip() for s in list):

This removes leading and trailing whitespace from the strings in list. Hence, it removes trailing newline characters from the strings.

In order to consolidate your program, you could do something like this:

with open('c:/tmp/textfile.TXT') as f:
    haystack = f.read()

if not haystack:
    sys.exit("Could not read haystack data :-(")

with open('c:/tmp/list.txt') as f:
    for needle in (line.strip() for line in f):
        if needle in haystack:
            print(needle, ',one_sentence')
        else:
            print(needle, ',another_sentence')

I did not want to make too drastic changes. The most important difference is that I am using the context manager here via the with statement. It ensures proper file handling (mainly closing) for you. Also, the 'needle' lines are stripped on the fly using a generator expression. The above approach reads and processes the needle file line by line instead of loading the whole file into memory at once. Of course, this only makes a difference for large files.

Search for strings listed in one file from another text file?

Answers (2)

Related Questions