Marco Dinatsoli
Marco Dinatsoli

Reputation: 10570

python scrapy stop scrapy when a condition happened

I want to extract all the data from a website.

I am using scrapy 0.20.2

My code is

class MySpider(CrawlSpider):
    start_urls = ['TheWebsite']
    rules = [Rule(SgmlLinkExtractor(allow=['/?page=\d+']), 'parse')]

    def parse(self, response):
        sites = sel.xpath('MyXPath')
        for site in sites:
            if condition < 8:
                yield Request(Link, meta = {'date': Date},\
                   callback = self.MyFunction)
            else:
                # Code to stop scrapy goes here.

the crawler will scrapy all the data from the url that has this syntax:

Mywebsite?page=INTEGER

but when a specific condition happened, I want to stop crawling. In my code I want to do that when the else happened. How please?

Upvotes: 0

Views: 2190

Answers (2)

jonrsharpe
jonrsharpe

Reputation: 122024

To exit the for loop at that point, use break:

for site in sites:
    if condition < 8:
        # ...
    else:
        break

This will put you outside the for loop and therefore exit parse. If you need to send a value back, rather than implicitly return None, you can return instead of break, which will also exit the function. break also allows you to have further code in your function:

for ...:
    if something:
        break
# do something else before finishing

Upvotes: 1

falsetru
falsetru

Reputation: 369064

Use break to terminate for loop or use return statement to leave the function.

for site in sites:
    if condition < 8:
        yield Request(Link, meta={'date': Date}, callback = self.MyFunction)
    else:
        break

Upvotes: 0

Related Questions