neenkart
neenkart

Reputation: 156

Xpath to get data before 2 <br> tags

I need to extract the text that comes before the 2 <br> tags, that is text 3. The code is similar to the following:

<div>
    <br>
    text1
    <br>
    text2
    <br>
    text3
    <br>
    <br>
    text4
    <br>
</div>

I tried //div/text()[preceding-sibling::br], but, it extracts all the texts.

Upvotes: 0

Views: 1281

Answers (1)

har07
har07

Reputation: 89315

Finding 2 consecutive <br>s in this scenario turns out to be trickier than I expected, because empty text node (the ones that consists of only whitespaces) need to be ignored here. This is one way :

/br[
    following-sibling::node()[self::*|self::text()[normalize-space()]
  ][1][self::br]]

The first predicate finds following sibling node, which type is either element node (self::*) or non-empty text node (self::text()[normalize-space()]). Then [1] takes only the first found node, and lastly [self::br] validates that the one found node is <br>.

The complete XPath expression would be as follow :

//div
 /br[
    following-sibling::node()[self::*|self::text()[normalize-space()]
  ][1][self::br]]
 /preceding-sibling::text()[1]

Upvotes: 5

Related Questions