Scrapy - can not list deeper links

Question

I need to create a list of website url. I use Scrapy 2.3.0 for this. The problem is that the result ('item_scraped_count') is 63 links, but I know there are more.

Is there any way to process deeper levels and pick up the url?

My code below:

from scrapy.spiders import CrawlSpider
from scrapy.spiders import Rule
from scrapy.linkextractors import LinkExtractor

from scrapy import Item
from scrapy import Field


class UrlItem(Item):
    url = Field()


class RetriveUrl(CrawlSpider):
    name = 'retrive_url'
    allowed_domains = ['example.com']
    start_urls = ['https://www.example.com']

    rules = (
        Rule(LinkExtractor(), callback='parse_url'),
    )

    def parse_url(self, response):
        item = UrlItem()
        item['url'] = response.url

        return item

Thiago Curvelo · Accepted Answer

You should allow the crawl to follow to the deeper levels. Try this:

Rule(LinkExtractor(), callback='parse_url', follow=True),

follow is a boolean which specifies if links should be followed from each response extracted with this rule. If callback is None follow defaults to True, otherwise it defaults to False.

(From the Scrapy docs)

Scrapy - can not list deeper links

Answers (1)

Related Questions