Scrapy Linkextractor or Rule not working

Question

I have been trying to get Scrapy's Linkextractor to work but with no avail. I want it to find any links and then call a different method that just prints something out to show it's working.

This is my spider:

from scrapy.spiders import Rule, CrawlSpider
from scrapy.linkextractors import LinkExtractor


class TestSpider(CrawlSpider):
    name = 'spi'
    allowed_domains = ['https://www.reddit.com/']
    start_urls = ['https://www.reddit.com/']

    rules = [
        Rule(LinkExtractor(allow=()),
             callback='detail', follow=True)
    ]

    def parse(self, response):
        print("parsed!")

    def detail(self, response):
        print('parsed detail!')

When I run the spider with the command "scrapy crawl spi": I get "parsed!", so it only goes to the parse function and not to the detail method.

Tom&#225;š Linhart · Accepted Answer

If you are using CrawlSpider base class for your spider, avoid using parse method as it will break the processing. Read the warning in the documentation.

Scrapy Linkextractor or Rule not working

Answers (2)

Related Questions