Marat
Marat

Reputation: 167

Search for strings listed in one file from another text file?

I want to find strings listed in list.txt (one string per line) in another text file in case I found it print 'string,one_sentence' in case didn't find 'string,another_sentence'. I'm using following code, but it is finding only last string in the strings list from file list.txt. Cannot understand what could be the reason?

data = open('c:/tmp/textfile.TXT').read()
for x in open('c:/tmp/list.txt').readlines():
    if x in data:
        print(x,',one_sentence')
    else:
        print(x,',another_sentence')

Upvotes: 1

Views: 8097

Answers (2)

Dr. Jan-Philip Gehrcke
Dr. Jan-Philip Gehrcke

Reputation: 35771

When you read a file with readlines(), the resulting list elements do have a trailing newline characters. Likely, these are the reason why you have less matches than you expected.

Instead of writing

for x in list:

write

for x in (s.strip() for s in list):

This removes leading and trailing whitespace from the strings in list. Hence, it removes trailing newline characters from the strings.

In order to consolidate your program, you could do something like this:

with open('c:/tmp/textfile.TXT') as f:
    haystack = f.read()

if not haystack:
    sys.exit("Could not read haystack data :-(")

with open('c:/tmp/list.txt') as f:
    for needle in (line.strip() for line in f):
        if needle in haystack:
            print(needle, ',one_sentence')
        else:
            print(needle, ',another_sentence')

I did not want to make too drastic changes. The most important difference is that I am using the context manager here via the with statement. It ensures proper file handling (mainly closing) for you. Also, the 'needle' lines are stripped on the fly using a generator expression. The above approach reads and processes the needle file line by line instead of loading the whole file into memory at once. Of course, this only makes a difference for large files.

Upvotes: 5

irrenhaus3
irrenhaus3

Reputation: 309

readlines() keeps a newline character at the end of each string read from your list file. Call strip() on those strings to remove those (and every other whitespace) characters.

Upvotes: 0

Related Questions