Reputation: 156
I need to extract the text that comes before the 2 <br>
tags, that is text 3
. The code is similar to the following:
<div>
<br>
text1
<br>
text2
<br>
text3
<br>
<br>
text4
<br>
</div>
I tried //div/text()[preceding-sibling::br]
, but, it extracts all the texts.
Upvotes: 0
Views: 1281
Reputation: 89315
Finding 2 consecutive <br>
s in this scenario turns out to be trickier than I expected, because empty text node (the ones that consists of only whitespaces) need to be ignored here. This is one way :
/br[
following-sibling::node()[self::*|self::text()[normalize-space()]
][1][self::br]]
The first predicate finds following sibling node, which type is either element node (self::*
) or non-empty text node (self::text()[normalize-space()]
). Then [1]
takes only the first found node, and lastly [self::br]
validates that the one found node is <br>
.
The complete XPath expression would be as follow :
//div
/br[
following-sibling::node()[self::*|self::text()[normalize-space()]
][1][self::br]]
/preceding-sibling::text()[1]
Upvotes: 5