Reputation: 526
Is there a way to keep track of each crawler's depth?
I am recursively crawling some websites.
My setup is similar to the below code.
import scrapy
class Crawl(scrapy.Spider):
name = "Crawl"
def start_requests(self):
if(condition is satisfied):
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth':1})
def parse(self, response):
next_crawl_depth = response.meta['depth'] + 1
if(condition is satisfied):
with open(filename, "a") as file:
file.write(record depth and url)
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth': next_crawl_depth})
This approach doesn't work.
For example, I would want to record each crawler's activity as such
crawler depth1 URL1
crawler depth2 URL2
...
Thank you in advance.
Upvotes: 1
Views: 644
Reputation: 166
I think you are almost there. Please try this code.
import scrapy
class Crawl(scrapy.Spider):
name = "Crawl"
def start_requests(self):
if(condition is satisfied):
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth':1})
def parse(self, response):
cur_crawl_depth = response.meta['depth']
next_crawl_depth = cur_crawl_depth + 1
if(condition is satisfied):
with open(filename, "w+") as f:
f.write(url + str(cur_crawl_depth) + "\n")
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth': next_crawl_depth})
Upvotes: 1