Reputation:
I'm trying to get Scrapy 0.12 to change it's "maximum depth" setting for different url in the start_urls variable in the spider.
If I understand correctly the documentation there's no way because the DEPTH_LIMIT setting is global for the entire framework and there's no notion of "requests originated from the initial one".
Is there a way to circumvent this? Is it possible to have multiple instances of the same spider initialized with each starting url and different depth limits?
Upvotes: 1
Views: 922
Reputation: 59604
Sorry, looks like i didn't understand you question correctly from the beginning. Correcting my answer:
Responses have depth
key in meta
. You can check it and take appropriate action.
class MySpider(BaseSpider):
def make_requests_from_url(self, url):
return Request(url, dont_filter=True, meta={'start_url': url})
def parse(self, response):
if response.meta['start_url'] == '???' and response.meta['depth'] > 10:
# do something here for exceeding limit for this start url
else:
# find links and yield requests for them with passing the start url
yield Request(other_url, meta={'start_url': response.meta['start_url']})
http://doc.scrapy.org/en/0.12/topics/spiders.html#scrapy.spider.BaseSpider.make_requests_from_url
Upvotes: 1