How to parse html item embedded items in xml file with Scrapy

Question

I am trying to parse an xml feed with XMLFeedSpider

In the XML feed I want to extract the 'price' item :

19,77 €

but this price item is in html code inside a tag as it follows :




<![CDATA[ product title ]]>


http://example.com/apage.html



   Prix normal :
40,00 € 
 Prix spécial :
19,77 € 
  
]]>

Here is my actual spider :

from scrapy.contrib.spiders import XMLFeedSpider
from scrapy.selector import XmlXPathSelector
from tutorial.items import DmozItem

class DmozSpider(XMLFeedSpider):
name = 'myspidername'
allowed_domains = ["example.com"]
start_urls = ['http://example.com/rss/catalog/new/store_id/1/']
iterator = 'iternodes'
itertag = 'channel'

def parse_node(self, response, node):
    title = node.select('item/title/text()').extract()
    link = node.select('item/link/text()').extract()
    price = node.select('*[@class=price"]text()').extract()
    item = DmozItem()
    item['title'] = title
    item['link'] = link
    item['price'] = price
    return item

The result :

Invalid Xpath: *[@class=price"]text()

How to parse html item embedded items in xml file with Scrapy

Answers (1)

Related Questions