j.white
j.white

Reputation: 27

I keep getting HTML in the XPath output! How do i just get text?

I keep getting HTML as well as the text I want in Xpath I am running and can't work out how to stop it as i just want the text.

The Xpath

hxs.xpath('//h1[@class="body2"]').extract()

The HTML

<div class="product-title cf">


            <h1 itemprop="name" class="body2">
                Cornish Ale Dozen - Case of 12
            </h1>


</div>

Any suggestions would be appreciated thanks

Upvotes: 1

Views: 42

Answers (1)

har07
har07

Reputation: 89285

Pure XPath instruction to get text nodes instead of the parent element would be as follow :

//h1[@class="body2"]/text()

Particularly, using the above XPath should work as you expected, assuming that the library being used to execute the XPath is Scrapy.

Upvotes: 1

Related Questions