myeewyee
myeewyee

Reputation: 767

Can't find string in text file

Given a list of item numbers, I am trying to search through a text file with a list of recent item numbers, and identify any in this recent list. I then want to add any items that weren't already in the recent list.

My code is below, it just doesn't seem to be finding anything in the text file. Why isn't it working?

def filter_recent_items(items):
    recentitems = []
    with open('last 600 items.txt', 'r+') as f:
        for item in items:
            if item['ID'] in f:
                print 'In! --', item['ID']
            else:
                recentitems.append(item['ID'])
                print 'Out ---', item['ID']
        for item in recentitems:
            f.write("%s\n" % item)


items = [ {'ID': 1}, {'ID': 'test2'} ]     
filter_recent_items(items)

For example , my text file is:

test2

test1

1

but the above code returns

Out --- 1
Out --- test2

Upvotes: 4

Views: 4239

Answers (3)

xgord
xgord

Reputation: 4776

The problem is in how you're checking for the existence of the specified text. In your code f is a file object, used for reading and writing to/from a file. So when you check if

str in f

It's not checking what you think it is. (See below for details.)

Instead, you need to read in the lines of the file and then iterate through those lines and check for necessary string. Ex.

with open('last 600 items.txt', 'r+') as f:
    lines = f.readlines()
    for l in lines:
        # check within each line for the presence of the items

In the above code exerpt, f.readlines() uses the file object to read the contents of the file and returns a list of strings, which are the lines within the file.

EDITED (credit to Peter Wood)

Python Membership Details

In Python, when you use the syntax x in y, it checks for 2 things:

Case 1: It first checks to see whether y has a __contains__(b) method. If so, it returns the result of y.__contains__(x).

Case 2: If however, y does not have a __contains__ method, but does define the __iter__ method, Python instead uses that method to iterate over the contents of y and returns True if at any point one of the values being iterated over equals x. Otherwise, it returns False.

If we use your code as the example, at a certain point, it is checking the truth of the statement "test2" in f. Here f is an object of type file. (Python File Object Description). File objects belong to Case 2 (i.e. they don't have __contains__, they do have __iter__.

So the code will go through each line and see whether your input strings are equal to any of the lines in the file. And since each line ends with the char \n, your strings are never going to return True.

To elaborate, while "test2" in "test2\n" would return True, the test that's actually being performed here is: "test2" == "test2\n", which is False.

You can test how this works on your file by hand. For exmaple, if we want to see if "test2" in f should return True:

with open(filename) as f:
    x = iter(f)
    while(True):
        try:
            line = x.next()
        except:
            break
        print(line)
        print(line == "test2")

You'll notice that it prints out each line (including the newline at the end) and that the result of line == "test2" is always False.

If however we were to try: "test2\n" in f, the result would be True.

End Edit

Upvotes: 6

Adam Smith
Adam Smith

Reputation: 54193

As others have said, if "somestring" in f will always fail. f is a file object which, when you iterate over it, produces lines of text. One or more of those LINES might contain your text, so instead you could do:

if any("targetstring" in line for line in f):
    # success

This is memory-saving versus the f.read() or f.readlines() approaches, which both stream the whole file into memory before doing anything.

@PeterWood points out in the comments that some of your target strings aren't actually strings. You should see to that, too. all(isinstance(item["ID"], str) for item in items) should be True.

Upvotes: 2

Prune
Prune

Reputation: 77837

Print out your data store, f. First of all, I expect that you have embedded newline characters that prevent the items from matching: "1" doesn't match "1\n". Second, note that **with open" gives you a generator, not a list or tuple. You can't scan the list multiple times. You don't have the data from it until you iterate through it somehow.

You need code to get all the elements into memory, such as

content = f.read().split("\n")
for item in items:
    if item["ID" in content:

Upvotes: 1

Related Questions