Reputation: 767
Given a list of item numbers, I am trying to search through a text file with a list of recent item numbers, and identify any in this recent list. I then want to add any items that weren't already in the recent list.
My code is below, it just doesn't seem to be finding anything in the text file. Why isn't it working?
def filter_recent_items(items):
recentitems = []
with open('last 600 items.txt', 'r+') as f:
for item in items:
if item['ID'] in f:
print 'In! --', item['ID']
else:
recentitems.append(item['ID'])
print 'Out ---', item['ID']
for item in recentitems:
f.write("%s\n" % item)
items = [ {'ID': 1}, {'ID': 'test2'} ]
filter_recent_items(items)
For example , my text file is:
test2
test1
1
but the above code returns
Out --- 1
Out --- test2
Upvotes: 4
Views: 4239
Reputation: 4776
The problem is in how you're checking for the existence of the specified text. In your code f
is a file object, used for reading and writing to/from a file. So when you check if
str in f
It's not checking what you think it is. (See below for details.)
Instead, you need to read in the lines of the file and then iterate through those lines and check for necessary string. Ex.
with open('last 600 items.txt', 'r+') as f:
lines = f.readlines()
for l in lines:
# check within each line for the presence of the items
In the above code exerpt, f.readlines()
uses the file object to read the contents of the file and returns a list of strings, which are the lines within the file.
EDITED (credit to Peter Wood)
In Python, when you use the syntax x in y
, it checks for 2 things:
Case 1: It first checks to see whether y
has a __contains__(b)
method. If so, it returns the result of y.__contains__(x)
.
Case 2: If however, y
does not have a __contains__
method, but does define the __iter__
method, Python instead uses that method to iterate over the contents of y
and returns True
if at any point one of the values being iterated over equals x
. Otherwise, it returns False
.
If we use your code as the example, at a certain point, it is checking the truth of the statement "test2" in f
. Here f
is an object of type file
. (Python File Object Description). File objects belong to Case 2 (i.e. they don't have __contains__
, they do have __iter__
.
So the code will go through each line and see whether your input strings are equal to any of the lines in the file. And since each line ends with the char \n
, your strings are never going to return True
.
To elaborate, while "test2" in "test2\n"
would return True
, the test that's actually being performed here is: "test2" == "test2\n"
, which is False
.
You can test how this works on your file by hand. For exmaple, if we want to see if "test2" in f
should return True
:
with open(filename) as f:
x = iter(f)
while(True):
try:
line = x.next()
except:
break
print(line)
print(line == "test2")
You'll notice that it prints out each line (including the newline at the end) and that the result of line == "test2"
is always False
.
If however we were to try: "test2\n" in f
, the result would be True
.
End Edit
Upvotes: 6
Reputation: 54193
As others have said, if "somestring" in f
will always fail. f
is a file object which, when you iterate over it, produces lines of text. One or more of those LINES might contain your text, so instead you could do:
if any("targetstring" in line for line in f):
# success
This is memory-saving versus the f.read()
or f.readlines()
approaches, which both stream the whole file into memory before doing anything.
@PeterWood points out in the comments that some of your target strings aren't actually strings. You should see to that, too. all(isinstance(item["ID"], str) for item in items)
should be True
.
Upvotes: 2
Reputation: 77837
Print out your data store, f. First of all, I expect that you have embedded newline characters that prevent the items from matching: "1" doesn't match "1\n". Second, note that **with open" gives you a generator, not a list or tuple. You can't scan the list multiple times. You don't have the data from it until you iterate through it somehow.
You need code to get all the elements into memory, such as
content = f.read().split("\n")
for item in items:
if item["ID" in content:
Upvotes: 1