Reputation: 1200
<div class="container-body">
<div class="rule"><hr></div>
<h3>Software version:</h3>
10.0.0
<div class="rule"><hr></div>
<h3>Operating system(s):</h3>
AIX, Linux, Windows
<div class="rule"><hr></div>
<h3>Reference #:</h3>
7042947
<div class="rule"><hr></div>
<h3>Modified date:</h3>
<p>2015-04-02</p>
</div>
Given the above code segment, How do I get the values 10.0.0; AIX,Linx,Windows; and 7042947 considering that they are not within any HTML tags.
Upvotes: 2
Views: 3520
Reputation: 57149
As often, the answer is: "it depends". If you just need the non-whitespace text nodes within <div>
, you can use the following, but it will select any child under <div>
that is a text node (but not grand-children).
div/text()[normalize-space()]
If you only want the text nodes following <div class="rule">...
and <h3>
explicitly, you can instruct XPath to do so:
div
/div[@class="rule"]
/following-sibling::*[1]
/self::h3
/following-sibling::text()[1]
Which means:
<div>
<div>
with attribute class="rule"
h3
Or if you want to select any non-whitespace text node in the whole document that is preceded by a <h3>
you can do the following:
//text()[normalize-space()][preceding-sibling::*[1]/self::h3]
This last expression is specifically crafted to ignore any comment nodes or PI instructions and only select the text node if its immediate preceding sibling element is <h3>
, otherwise it will ignore it.
Hopefully the above examples give you enough tools to construct your XPath, but if your requirement isn't in there and you can't figure it out, just ask.
Upvotes: 3
Reputation: 14231
XPath may be simple as:
"*/text()"
or as:
"*/text()[normalize-space()]"
Depends on the library.
Upvotes: 1
Reputation: 352
To get AIX, Linux, Windows
use the following xpath,
//h3[2]/following-sibling::text()[1]
similarly create other xpaths to get your string.
Upvotes: 0