I keep getting HTML in the XPath output! How do i just get text?

Question

I keep getting HTML as well as the text I want in Xpath I am running and can't work out how to stop it as i just want the text.

The Xpath

hxs.xpath('//h1[@class="body2"]').extract()

The HTML




            
                Cornish Ale Dozen - Case of 12

Any suggestions would be appreciated thanks

har07 · Accepted Answer

Pure XPath instruction to get text nodes instead of the parent element would be as follow :

//h1[@class="body2"]/text()

Particularly, using the above XPath should work as you expected, assuming that the library being used to execute the XPath is Scrapy.

Answers (1)