WebOrCode
WebOrCode

Reputation: 7284

Scrapy Spider not following Request callback

I have read Scrapy: Follow link to get additional Item data? and followed it, but it is not working, probably it is sone simple mistake, so I am putting source code of my Spider.

import scrapy
from scrapy.spider import Spider
from scrapy.selector import Selector

class MySpider1(Spider):
    name = "timeanddate"
    allowed_domains = ["http://www.timeanddate.com"]
    start_urls = (
        'http://www.timeanddate.com/holidays/',
    )

    def parse(self, response):
        countries = Selector(response).xpath('//div[@class="fixed"]//li/a[contains(@href, "/holidays/")]')

        for item in countries:

            link = item.xpath('@href').extract()[0]
            country = item.xpath('text()').extract()[0]

            linkToFollow = self.allowed_domains[0] + link + "/#!hol=1"

            print link  # link
            print country  # text in a HTML tag
            print linkToFollow

            request = scrapy.Request(linkToFollow, callback=self.parse_page2)


    def parse_page2(self, response):
        print "XXXXXX"
        hxs = HtmlXPathSelector(response)

        print hxs

I am trying too get list of all holidays per for each country, that is what I need to get to another page.

I can not understand why parse_page2 is not called.

Upvotes: 1

Views: 706

Answers (1)

André Teixeira
André Teixeira

Reputation: 2562

I could make your example work using Link Extractors

Here is an example:

#-*- coding: utf-8 -*-
from scrapy.contrib.spiders import CrawlSpider,Rule
from scrapy.contrib.linkextractors.lxmlhtml import LxmlLinkExtractor

class TimeAndDateSpider(CrawlSpider):
    name = "timeanddate"
    allowed_domains = ["timeanddate.com"]
    start_urls = [
        "http://www.timeanddate.com/holidays/",
    ]


    rules = (
            Rule (LxmlLinkExtractor(restrict_xpaths=('//div[@class="fixed"]//li/a[contains(@href, "/holidays/")]',))
                , callback='second_page'),
            ) 

    #2nd page
    def second_page(self,response):
        print "second page - %s" % response.url

Will keep trying to make the Request callback example to work

Upvotes: 1

Related Questions