Reputation: 2327
I want to call N scrapy requests from start_requests
. This value is dynamic since I want to loop through all pages in an API. I do not know the limit number of pages before hand. But I know that when I exceed the number of pages, the response of the API will be an empty json. I want to do something like:
url = "https://example.com?page={}"
def start_requests(self):
page = 0
while True:
page += 1
yield scrapy.Request(url=url.format(page), callback=self.parse)
def parse(self, response, **kwargs):
data = json.loads(response.body)
if 'key' in data:
# parse and yield an item
pass
else:
# do not yield an item and break while loop in start_requests
I do not know how to achieve this. Can I return
a value from callback (instead of yield
) when condition is met?
Upvotes: 1
Views: 174
Reputation: 17355
No, but you can set a class attribute to be a flag that indicates the start_requests
should no longer continue:
for example:
class MySpider(scrapy.Spider):
url = "https://example.com?page={}"
keep_crawling = True
def start_requests(self):
page = 0
while self.keep_crawling:
page += 1
yield scrapy.Request(url=url.format(page), callback=self.parse)
def parse(self, response, **kwargs):
data = json.loads(response.body)
if 'key' in data:
# parse and yield an item
pass
else:
self.keep_crawling = False
Upvotes: 2