shiteatlife
shiteatlife

Reputation: 67

Scrapy Linkextractor or Rule not working

I have been trying to get Scrapy's Linkextractor to work but with no avail. I want it to find any links and then call a different method that just prints something out to show it's working.

This is my spider:

from scrapy.spiders import Rule, CrawlSpider
from scrapy.linkextractors import LinkExtractor


class TestSpider(CrawlSpider):
    name = 'spi'
    allowed_domains = ['https://www.reddit.com/']
    start_urls = ['https://www.reddit.com/']

    rules = [
        Rule(LinkExtractor(allow=()),
             callback='detail', follow=True)
    ]

    def parse(self, response):
        print("parsed!")

    def detail(self, response):
        print('parsed detail!')

When I run the spider with the command "scrapy crawl spi": I get "parsed!", so it only goes to the parse function and not to the detail method.

Upvotes: 1

Views: 1233

Answers (2)

scriptso
scriptso

Reputation: 677

enter image description here

theres no need to comment out parse... but change to the defaults parse_item... or what ever youd like! point is, parse is a logic function already in a Crawl spider..

In the future when using "... genspider etc etc" try "scrapy genspider -t crawl SPIDERNAME BASEURL (NO http/s://www.... IE = site.com)"

Upvotes: 1

Tomáš Linhart
Tomáš Linhart

Reputation: 10210

If you are using CrawlSpider base class for your spider, avoid using parse method as it will break the processing. Read the warning in the documentation.

Upvotes: 3

Related Questions