Webdev
Webdev

Reputation: 159

Scrapy - can not list deeper links

I need to create a list of website url. I use Scrapy 2.3.0 for this. The problem is that the result ('item_scraped_count') is 63 links, but I know there are more.

Is there any way to process deeper levels and pick up the url?

My code below:

from scrapy.spiders import CrawlSpider
from scrapy.spiders import Rule
from scrapy.linkextractors import LinkExtractor

from scrapy import Item
from scrapy import Field


class UrlItem(Item):
    url = Field()


class RetriveUrl(CrawlSpider):
    name = 'retrive_url'
    allowed_domains = ['example.com']
    start_urls = ['https://www.example.com']

    rules = (
        Rule(LinkExtractor(), callback='parse_url'),
    )

    def parse_url(self, response):
        item = UrlItem()
        item['url'] = response.url

        return item

Upvotes: 1

Views: 38

Answers (1)

Thiago Curvelo
Thiago Curvelo

Reputation: 3740

You should allow the crawl to follow to the deeper levels. Try this:

Rule(LinkExtractor(), callback='parse_url', follow=True),

follow is a boolean which specifies if links should be followed from each response extracted with this rule. If callback is None follow defaults to True, otherwise it defaults to False.

(From the Scrapy docs)

Upvotes: 3

Related Questions