Reputation: 4620
I have searched on google and seen questions on Stack overflow too, but nothing is working. I have gone through
from scrapy.selector import HtmlXPathSelector
but nothing worked,response.body and response.headers are working well however response.selector and response.xpath() is giving error saying that no such attribute exists for the response object
I am not able to import Selector too,because there is no Selector
in the scrapy directory hierarchy(don't know why)
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class DmozSpider(BaseSpider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
for sel in response.xpath('//ul/li'):
title = sel.xpath('a/text()').extract()
link = sel.xpath('a/@href').extract()
desc = sel.xpath('text()').extract()
print title, link, desc
I am using SCRAPY 0.16 (working with Django Dynamic Scraper,so can't update because it is compatible only with this version only)
Upvotes: 1
Views: 704
Reputation: 504
You are probably looking at the documentation for the latest version. There have been quite a few changes since 0.16. You should be looking at the documentation for 0.16 http://doc.scrapy.org/en/0.16
Your example should look like this:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class DmozSpider(BaseSpider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//ul/li')
for site in sites:
title = site.select('a/text()').extract()
link = site.select('a/@href').extract()
desc = site.select('text()').extract()
print title, link, desc
As described in the tutorial http://doc.scrapy.org/en/0.16/intro/tutorial.html
Upvotes: 1