Reputation: 2523
I have to retrieve text inside an HTML table, in the cells the text sometimes is inside a <div>
and sometimes is not.
How can I make a div
in a XPath optional?
My actual code:
stuff = tree.xpath("/html/body/table/tbody/tr/td[5]/div/text()")
Wanted pseudocode:
stuff = tree.xpath("/html/body/table/tbody/tr/td[5]/div or nothing/text()")
Upvotes: 2
Views: 148
Reputation: 111726
You want the string value of the td[5]
element. Use string()
:
stuff = tree.xpath("string(/html/body/table/tbody/tr/td[5])")
This will return text without markup beneath td[5]
.
You can also indirectly obtain the string value of an element via normalize-space()
as suggested by splash58 in the comments, if you also want whitespace to be trimmed on the ends and reduced interiorly.
Upvotes: 1