Reputation: 7284
I have read Scrapy: Follow link to get additional Item data? and followed it, but it is not working, probably it is sone simple mistake, so I am putting source code of my Spider.
import scrapy
from scrapy.spider import Spider
from scrapy.selector import Selector
class MySpider1(Spider):
name = "timeanddate"
allowed_domains = ["http://www.timeanddate.com"]
start_urls = (
'http://www.timeanddate.com/holidays/',
)
def parse(self, response):
countries = Selector(response).xpath('//div[@class="fixed"]//li/a[contains(@href, "/holidays/")]')
for item in countries:
link = item.xpath('@href').extract()[0]
country = item.xpath('text()').extract()[0]
linkToFollow = self.allowed_domains[0] + link + "/#!hol=1"
print link # link
print country # text in a HTML tag
print linkToFollow
request = scrapy.Request(linkToFollow, callback=self.parse_page2)
def parse_page2(self, response):
print "XXXXXX"
hxs = HtmlXPathSelector(response)
print hxs
I am trying too get list of all holidays per for each country, that is what I need to get to another page.
I can not understand why parse_page2 is not called.
Upvotes: 1
Views: 706
Reputation: 2562
I could make your example work using Link Extractors
Here is an example:
#-*- coding: utf-8 -*-
from scrapy.contrib.spiders import CrawlSpider,Rule
from scrapy.contrib.linkextractors.lxmlhtml import LxmlLinkExtractor
class TimeAndDateSpider(CrawlSpider):
name = "timeanddate"
allowed_domains = ["timeanddate.com"]
start_urls = [
"http://www.timeanddate.com/holidays/",
]
rules = (
Rule (LxmlLinkExtractor(restrict_xpaths=('//div[@class="fixed"]//li/a[contains(@href, "/holidays/")]',))
, callback='second_page'),
)
#2nd page
def second_page(self,response):
print "second page - %s" % response.url
Will keep trying to make the Request callback example to work
Upvotes: 1