Reputation: 2071
When I request a server sometimes it replies a webpage that isn't the one that is expected even when the response status is 200.I know I can use this method to request the same url multiple times :
def parse(response):
try:
# parsing logic here
except AttributeError:
yield Request(response.url, callback=self.parse, dont_filter=True)
But how would one do to limit the amount of times, let's say 10 times, that the same url can be requested to avoid infinite loops when the webpage really is what it is ?
Upvotes: 4
Views: 115
Reputation: 477170
Well you could use functools.partial
to add a parameter to call parse
with the parameter of the number of attempts already done. In case this is higher than a certain threshold (here 10), you do not yield
a new Request
. So:
from functools import partial
def parse(response,ntimes=0):
try:
# parsing logic here
pass
except AttributeError:
if ntimes < 10:
yield Request(response.url, callback=partial(self.parse,ntimes=ntimes+1), dont_filter=True)
So here instead of using parse
as the callback you wrap a partial(..)
around it setting ntimes
to the previous ntimes+1
(so you increment the "virtual" counter so to speak). When ntimes
is 10 or higher, you no longer add the a request to the queue.
You set ntimes=0
by default to be 0
such that you can still add a reference to parse
without specifying the amount of times it has been called (in that case parse
"assumes" the url is not yet been called).
Upvotes: 1