Reputation: 81
I am using python along with scrapy. I want to extract the text from the div tag which is inside a div class. For example:
<div class="ld-header">
<h1>2013 Gulfstream G650ER for Sale</h1>
<div id="header-price">Price - $46,500,000</div>
</div>
I've extracted text from h1 tag
result.xpath('//div[@class="ld-header"]/h1/text()').extract()
but I can't extract Price. I've tried
'price': result.xpath('//div[@class="ld-header"]/div[@id="header-price"]/text()').extract()
Upvotes: 1
Views: 5363
Reputation: 1690
Try This one and you tell me :)
price = [x.replace('Price - ', '').replace('$', '') for x in result.xpath('//div[@class="ld-header"]/h1/text()').extract()]
This is a 'for' loop inside all the items in the extraction where you replace all the info you don't need with the 'replace()' method.
Upvotes: 1
Reputation: 434
As you have an id, you do not need to use the complete path to the element. Ids are unique per Webpage:
This Xpath:
//div[@id="header-price"]/text()
used on the give XML will return:
'Price - $46,500,000'
For debugging Xpath and CSS Selectors, I always find it helpful to use an online checker (just use Google to find some suggestions).
Upvotes: 1