ChiseledAbs
ChiseledAbs

Reputation: 2071

How to allow the same request n times at most?

When I request a server sometimes it replies a webpage that isn't the one that is expected even when the response status is 200.I know I can use this method to request the same url multiple times :

def parse(response):
    try:
        # parsing logic here
    except AttributeError:
        yield Request(response.url, callback=self.parse, dont_filter=True)

But how would one do to limit the amount of times, let's say 10 times, that the same url can be requested to avoid infinite loops when the webpage really is what it is ?

Upvotes: 4

Views: 115

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477170

Well you could use functools.partial to add a parameter to call parse with the parameter of the number of attempts already done. In case this is higher than a certain threshold (here 10), you do not yield a new Request. So:

from functools import partial

def parse(response,ntimes=0):
    try:
        # parsing logic here
        pass
    except AttributeError:
        if ntimes < 10:
            yield Request(response.url, callback=partial(self.parse,ntimes=ntimes+1), dont_filter=True)

So here instead of using parse as the callback you wrap a partial(..) around it setting ntimes to the previous ntimes+1 (so you increment the "virtual" counter so to speak). When ntimes is 10 or higher, you no longer add the a request to the queue.

You set ntimes=0 by default to be 0 such that you can still add a reference to parse without specifying the amount of times it has been called (in that case parse "assumes" the url is not yet been called).

Upvotes: 1

Related Questions