Mehdi
Mehdi

Reputation: 133

Readeability IXML xpath does not work

I am trying to retrieve some items when I read them through readability and scrapy. I have written this code:

titles = response.xpath("//a[@class='media__link']").extract()
    #titles = response.xpath('//a/@href').extract()
    print ("%d links was found" %len(titles))


    count=0
    for title in titles:
      item = TutsplusItem()
      item["title"] = title
      print("Title is : %s" %title)
      yield item
      titleInner = Document(title)
      link = titleInner.xpath("//a/@href")
      link = "http://www.bbc.com" + link
      response = requests.get(link)
      doc = Document(response)

      title=doc.xpath("//title/text()")
      headline=doc.xpath("//p[@class='story-body__introduction']/text()")
      bodyText=doc.xpath("//div[class='story-body__inner']/text()")

However, I get an error when I run xpath on the readability document on this line:

link = titleInner.xpath("//a/@href)

The error is:

Traceback (most recent call last):
File "c:\python27\lib\site-packages\scrapy-1.3.1-py2.7.egg\scrapy\utils\defer.py", line 102, in iter_errback
yield next(it)
File "c:\python27\lib\site-packages\scrapy-1.3.1-py2.7.egg\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
for x in result:
File "c:\python27\lib\site-packages\scrapy-1.3.1-py2.7.egg\scrapy\spidermiddlewares\referer.py", line 22, in
return (_set_referer(r) for r in result or ())
File "c:\python27\lib\site-packages\scrapy-1.3.1-py2.7.egg\scrapy\spidermiddlewares\urllength.py", line 37, in
return (r for r in result or () if _filter(r))
File "c:\python27\lib\site-packages\scrapy-1.3.1-py2.7.egg\scrapy\spidermiddlewares\depth.py", line 58, in
return (r for r in result or () if _filter(r))
File "C:\Users\Mehdi\PycharmProjects\WebCrawler\src\Crawler.py", line 69, in parse
link = titleInner.xpath("//a/@href")
TypeError: Type '' cannot be serialized.

I can't get where is the problem?

Upvotes: 0

Views: 53

Answers (1)

Mehdi
Mehdi

Reputation: 133

I am avoiding readability and use LXML!

Upvotes: 0

Related Questions