Jeong Kim
Jeong Kim

Reputation: 526

How to get depth of each crawler with Scrapy

Is there a way to keep track of each crawler's depth?

I am recursively crawling some websites.

My setup is similar to the below code.

import scrapy

class Crawl(scrapy.Spider):
    name = "Crawl"

    def start_requests(self):
        if(condition is satisfied):
            yield scrapy.Request(url=url, 
                                 callback=self.parse,
                                 meta={'depth':1})

    def parse(self, response):
        next_crawl_depth = response.meta['depth'] + 1
        if(condition is satisfied):
            with open(filename, "a") as file:
                file.write(record depth and url)
            yield scrapy.Request(url=url,
                                 callback=self.parse,
                                 meta={'depth': next_crawl_depth})

This approach doesn't work.

For example, I would want to record each crawler's activity as such

crawler depth1 URL1
crawler depth2 URL2
...

Thank you in advance.

Upvotes: 1

Views: 644

Answers (1)

Norbert Herman
Norbert Herman

Reputation: 166

I think you are almost there. Please try this code.

import scrapy

class Crawl(scrapy.Spider):
name = "Crawl"

def start_requests(self):
    if(condition is satisfied):
        yield scrapy.Request(url=url, 
                             callback=self.parse,
                             meta={'depth':1})

def parse(self, response):
    cur_crawl_depth = response.meta['depth']
    next_crawl_depth = cur_crawl_depth + 1
    if(condition is satisfied):
        with open(filename, "w+") as f:
            f.write(url + str(cur_crawl_depth) + "\n")
        yield scrapy.Request(url=url,
                             callback=self.parse,
                             meta={'depth': next_crawl_depth})

Upvotes: 1

Related Questions