Reputation: 11
I'm setting up a scrapy project. In my project, there is a for-loop which should be controlled by the scrawl results but the yield Request()
keyword will not return a value. So how do I control the for-loop in scrapy? See the code below for more details:
def parse_area_detail(self, response):
for page in range(100):
page_url = parse.urljoin(response.url, 'pg' + str(page + 1))
yield Request(page_url, callback=self.parse_detail)
# the pase_detail funtion will get a title list. If the title list is
# empty, the for loop should be stopped.
def parse_detail(self, response):
title_list=response.xpath("//div[@class='title']/a/text()").extract()
The parse_detail
function will get a title list. I expect that if the title list is empty then the for-loop will stop. But I know my code doesn't work like that. How do I change my code to make it work?
Upvotes: 1
Views: 680
Reputation: 3740
You could request the next page after parsing the current one. Thus, you could decide to continue if the list is not empty. Eg.
start_urls = ['http://example.com/?p=1']
base_url = 'http://example.com/?p={}'
def parse(self, response):
title_list=response.xpath("//div[@class='title']/a/text()").extract()
# ... do what you want to do with the list, then ...
if title_list:
next_page = response.meta.get('page', 1) + 1
yield Request(
self.base_url.format(next_page),
meta={'page': next_page},
callback=self.parse
)
Upvotes: 3