Scrapy using CrawlSpider doesnt works

Question

I'm working with Scrapy and I try to use the spider to crawl the whole website but I don't get any result in my terminal.

PS: I run Scrapy from the browser in a script.

This is my code :

import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

class MySpider(CrawlSpider):
    name = 'website.com'
    allowed_domains = ['website.com']
    start_urls = ['http://www.website.com']

    rules = (
        # Extract links matching 'category.php' (but not matching 'subsection.php')
        # and follow links from them (since no callback means follow=True by default).
        Rule(LinkExtractor(allow=('/', ), deny=('subsection\.php', ))),

        # Extract links matching 'item.php' and parse them with the spider's method parse_item

    )

    def parse_item(self, response):
        print(response.css('title').extract())




process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start()

Leonardo Maffei · Accepted Answer

You missed the callback argument.

Simply change

Rule(LinkExtractor(allow=('/', ), deny=('subsection\.php', ))),

to

Rule(LinkExtractor(allow=('/', ), deny=('subsection\.php', )), callback='parse_item')

According to crawlspider docs you forgot to pass the callback argument to you link extractor.

Scrapy using CrawlSpider doesnt works

Answers (1)

Related Questions