Reputation:
I am trying to get the price texts in
potterybarn in scrapy shell. I used scrapy shell "https://www.potterybarnkids.com/shop/easter/easter-shop-all-baskets/"
then trying to get the price inside span class="price-state price-sale"
is there a way to extract entire text inside span with going into each span inside it?
I tried
response.xpath('//span[@class="price-state price-sale"]/text()').extract()
also response.xpath('//span[@class="price-state price-sale"]//text()')[0].extract()
I need a way to extract all texts inside the selector weather it has inner spans,divs,...
I checked How can i extract only text in scrapy selector in python, also Scrapy extracting text from div in this one the answer assumes that it will contain only span children which will work in that example and this one. but is there a more general way to correctly extract all text inside children because //text()
isn't working.
Upvotes: 1
Views: 3139
Reputation: 675
I think there are more efficient ways, but the following xpath
does the job. The string()
on xpath gather text from all children nodes.
You can find more information about differences between string()
and text()
on this post Difference between text() and string()
prices = [
r.xpath('string(.)').extract_first()
for r in response.xpath('//span[@class="price-state price-sale"]')
]
As you can see on results, one product per row. It could be cleaned with replace
for example, or prices extracted using regex
>>> prices
['\n\nSale\n\n\n$5.99\n–\n\n$18.99\n', '\n\nSale\n\n\n$6...
Other option would be to do it in two steps, using text()
instead of string()
and cleaning data before join
operation:
>>> prices = []
>>> for r in response.xpath('//span[@class="price-state price-sale"]'):
>>> price = [p.strip() for p in r.xpath('.//text()').extract() if p.strip()]
>>> prices.append(' '.join(price))
Results in this case are already cleaned
>>> prices
['Sale $ 5.99 – $ 18.99', 'Sale $ 6.99 – $ 18.99', 'Sale $ 6.99...
Upvotes: 2