Reputation: 21
I've got the following problem: I'm scraping prices from a website and it works but it only takes the numbers in front of the comma.
Example: when something is worth "€ 79,90" it will only scrape the 79, and not the 90.
<span class="price right right10">
€ 79,
<sup>
90*
</sup>
</span>
I want to store this in one item field like this:
class Prices(scrapy.Item):
price = scrapy.Field()
This is my current xpath selector:
item['price'] = ''.join(sel.xpath('div[@class="waresSum"]/p/span/text()').extract())
Upvotes: 1
Views: 285
Reputation: 474171
The key problem is that you are asking for direct text child nodes of the span
, you need to get all the text nodes from the inside of the span
element:
//div[@class="waresSum"]/p/span//text()
HERE^
Also, I would use .re()
to filter out unwanted characters and get only digits, ,
and -
:
$ scrapy shell index.html
In [9]: ''.join(response.xpath('//span//text()').re(r'[0-9,\-]+'))
Out[9]: u'79,90'
Upvotes: 3