How can I make Scrapy issue a callback properly?

Question

I'm currently trying to work with the Scrapy framework to simply collect a bunch of URLs that I can store and sort later. However, I can't seem to get URLs to print or be stored in a file on callback, no matter what I've tried and adapted from other tutorials. Here's currently what I'm going for with my spider class for this particular example, choosing a small site:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from crawler.items import CrawlerItem
from scrapy import log

class CrawlerSpider(CrawlSpider):
    name = 'crawler'

    allowed_domains = [
        "glauberkotaki.com"
    ]

    start_urls = [
        "http://www.glauberkotaki.com"
    ]

    rules = (
        Rule(SgmlLinkExtractor(allow=(), deny=('about.html'))),
        Rule(SgmlLinkExtractor(allow=('about.html')), callback='parseLink', follow="yes"),
    )

    def parseLink(self, response):
        x = HtmlXPathSelector(response)
        print(response.url)
        print("
")

It crawls all of the pages of this site fine, but it doesn't print out anything at all, even when it comes across the webpage "www.glauberkotaki.com/about.html", which is what I was trying to test the code with. It seems to me there's an issue with the callback being called.

How can I make Scrapy issue a callback properly?

Answers (1)

Related Questions