X.F.Liu
X.F.Liu

Reputation: 11

How to use the `yield Request()` to control the FOR Loop in scrapy?

I'm setting up a scrapy project. In my project, there is a for-loop which should be controlled by the scrawl results but the yield Request() keyword will not return a value. So how do I control the for-loop in scrapy? See the code below for more details:

def parse_area_detail(self, response):
    for page in range(100):
        page_url = parse.urljoin(response.url, 'pg' + str(page + 1))
        yield Request(page_url, callback=self.parse_detail)
        # the pase_detail funtion will get a title list. If the title list is 
        #  empty, the for loop should be stopped.

def parse_detail(self, response):   
  title_list=response.xpath("//div[@class='title']/a/text()").extract()

The parse_detail function will get a title list. I expect that if the title list is empty then the for-loop will stop. But I know my code doesn't work like that. How do I change my code to make it work?

Upvotes: 1

Views: 680

Answers (1)

Thiago Curvelo
Thiago Curvelo

Reputation: 3740

You could request the next page after parsing the current one. Thus, you could decide to continue if the list is not empty. Eg.

start_urls = ['http://example.com/?p=1']
base_url = 'http://example.com/?p={}'

def parse(self, response):   
    title_list=response.xpath("//div[@class='title']/a/text()").extract()

    # ... do what you want to do with the list, then ...

    if title_list:
        next_page = response.meta.get('page', 1) + 1
        yield Request(
            self.base_url.format(next_page), 
            meta={'page': next_page},  
            callback=self.parse
        )

Upvotes: 3

Related Questions