Reputation: 2523

Get text via XPath, ignoring markup

I have to retrieve text inside an HTML table, in the cells the text sometimes is inside a <div> and sometimes is not.

How can I make a div in a XPath optional?

My actual code:

stuff = tree.xpath("/html/body/table/tbody/tr/td[5]/div/text()")

Wanted pseudocode:

stuff = tree.xpath("/html/body/table/tbody/tr/td[5]/div or nothing/text()")

Upvotes: 2

Answers (1)

Reputation: 111726

You want the string value of the td[5] element. Use string():

stuff = tree.xpath("string(/html/body/table/tbody/tr/td[5])")

This will return text without markup beneath td[5].

You can also indirectly obtain the string value of an element via normalize-space() as suggested by splash58 in the comments, if you also want whitespace to be trimmed on the ends and reduced interiorly.

Upvotes: 1