MLSC
MLSC

Reputation: 5972

grabbing value from html by xpath in python

I want to use xpath for grabbing WhatIwant phrase:

a="<b>AAA:</b> BBB<br/><br/><img src='line.gif' /><br/><br/><b><font size='2'>Text: </b>WahtIwant</font><br/><center>"

I want to grab WahtIwant from a:

tree=html.fromstring(a)
tree.xpath('//font[@size="2"]/text()')
['Text: ']

Upvotes: 1

Views: 68

Answers (2)

har07
har07

Reputation: 89325

In xpath point of view, the text you want is following-sibling of the <b> element that is parent of font[@size="2"] :

tree.xpath('//font[@size="2"]/parent::b/following-sibling::text()')

or, you can use xpath that select <b> element having child font with size attribute equals 2, and then select text node following that <b> :

tree.xpath('//b[font/@size="2"]/following-sibling::text()')

Upvotes: 1

falsetru
falsetru

Reputation: 369424

Using lxml and tail property (text that directly follows the element) of the element.

>>> import lxml.html
>>> 
>>> a = "<b>AAA:</b> BBB<br/><br/><img src='line.gif' /><br/><br/><b><font size='2'>Text: </b>WahtIwant</font><br/><center>"
>>> root = lxml.html.fromstring(a)
>>> [x.tail for x in root.xpath('//font[@size="2"]/parent::b')]
['WahtIwant']

Upvotes: 0

Related Questions