extract text from div selector with Scrapy

Question

I am trying to get the price texts in potterybarn in scrapy shell. I used scrapy shell "https://www.potterybarnkids.com/shop/easter/easter-shop-all-baskets/" then trying to get the price inside span class="price-state price-sale" is there a way to extract entire text inside span with going into each span inside it?

I tried

response.xpath('//span[@class="price-state price-sale"]/text()').extract() also response.xpath('//span[@class="price-state price-sale"]//text()')[0].extract()

I need a way to extract all texts inside the selector weather it has inner spans,divs,...

I checked How can i extract only text in scrapy selector in python, also Scrapy extracting text from div in this one the answer assumes that it will contain only span children which will work in that example and this one. but is there a more general way to correctly extract all text inside children because //text() isn't working.

Marcos · Accepted Answer

I think there are more efficient ways, but the following xpath does the job. The string() on xpath gather text from all children nodes.

You can find more information about differences between string() and text() on this post Difference between text() and string()

prices = [
    r.xpath('string(.)').extract_first() 
    for r in response.xpath('//span[@class="price-state price-sale"]')
]

As you can see on results, one product per row. It could be cleaned with replace for example, or prices extracted using regex

>>> prices
['

Sale


$5.99
–

$18.99
', '

Sale


$6...

Other option would be to do it in two steps, using text() instead of string() and cleaning data before join operation:

>>> prices = []
>>> for r in response.xpath('//span[@class="price-state price-sale"]'):
>>>     price = [p.strip() for p in r.xpath('.//text()').extract() if p.strip()]
>>>     prices.append(' '.join(price))

Results in this case are already cleaned

>>> prices
['Sale $ 5.99 – $ 18.99', 'Sale $ 6.99 – $ 18.99', 'Sale $ 6.99...

extract text from div selector with Scrapy

Answers (1)

Related Questions