Reputation: 81

Extract text from div class with scrapy

I am using python along with scrapy. I want to extract the text from the div tag which is inside a div class. For example:

 <div class="ld-header">
    <h1>2013 Gulfstream G650ER  for Sale</h1>
    <div id="header-price">Price - $46,500,000</div>
</div>

I've extracted text from h1 tag

result.xpath('//div[@class="ld-header"]/h1/text()').extract()

but I can't extract Price. I've tried

'price': result.xpath('//div[@class="ld-header"]/div[@id="header-price"]/text()').extract()

Upvotes: 1

Answers (2)

Reputation: 1690

Try This one and you tell me :)

price = [x.replace('Price - ', '').replace('$', '') for x in result.xpath('//div[@class="ld-header"]/h1/text()').extract()]

This is a 'for' loop inside all the items in the extraction where you replace all the info you don't need with the 'replace()' method.

Upvotes: 1

Reputation: 434

As you have an id, you do not need to use the complete path to the element. Ids are unique per Webpage:

This Xpath:

//div[@id="header-price"]/text()

used on the give XML will return:

'Price - $46,500,000'

For debugging Xpath and CSS Selectors, I always find it helpful to use an online checker (just use Google to find some suggestions).

Upvotes: 1