user2199416
user2199416

Reputation: 21

Scrapy: store data with part in a nested tag in one item field

I've got the following problem: I'm scraping prices from a website and it works but it only takes the numbers in front of the comma.

Example: when something is worth "€ 79,90" it will only scrape the 79, and not the 90.

<span class="price right right10">
    € 79,
    <sup>
    90*
    </sup>
</span>

I want to store this in one item field like this:

class Prices(scrapy.Item):
    price = scrapy.Field()

This is my current xpath selector:

item['price'] = ''.join(sel.xpath('div[@class="waresSum"]/p/span/text()').extract())

Upvotes: 1

Views: 285

Answers (1)

alecxe
alecxe

Reputation: 474171

The key problem is that you are asking for direct text child nodes of the span, you need to get all the text nodes from the inside of the span element:

//div[@class="waresSum"]/p/span//text()
                            HERE^

Also, I would use .re() to filter out unwanted characters and get only digits, , and -:

$ scrapy shell index.html
In [9]: ''.join(response.xpath('//span//text()').re(r'[0-9,\-]+'))
Out[9]: u'79,90'

Upvotes: 3

Related Questions