XPath to select preceding element with optional intervening whitespace-only text node

Question

Given an element as context I want to select the preceding sibling element and check to see if it has a particular name. The caveat is that I do not want to select it if there is an intervening text node that has non-whitespace content.

For example, given this XML document…

…then:

For "a1" there should be no match (there is no sibling element immediately preceding it)

For "a2" then "a1" should be matched (there is no intervening text node)

For "a3" there should be no match (there is an intervening text node with non-whitespace contents)

For "a4" then "a3" should be matched (the intervening text node is only whitespace)

For "a5" there should be no match (the preceding sibling element is not an ).

I can check to see if the preceding sibling is an with preceding-sibling::*[1][name()="a"]

However, I can't figure out how to say "select the following sibling node, regardless of element or textness, and see if that's not text or normalize-space(.)="". My best guess was this:

preceding-sibling::*[1][name()="a"][following-sibling::node()[1][not(text()) or normalize-space(.)=""]]

…but that appears to have no effect.

Here's my test Ruby file:

require 'nokogiri' xpath = 'preceding-sibling::*[1][name()="a"][following-sibling::node()[1][not(text()) or normalize-space(.)=""]]' fragment = Nokogiri::XML.fragment 'a1a2 b a3 a4 a5' fragment.css('a').each{ |a| p [a.text,a.xpath(xpath).to_s] } #=> ["a1", ""] #=> ["a2", ""] #=> ["a3", "a2"] #=> ["a4", "a3"] #=> ["a5", ""]

The result for "a2" and "a3" are what is wrong and confuses me. It finds the preceding correctly, but then does not correctly verify that the first following-sibling of that is either not text (which should allow "a2" to find "a1") or that it is whitespace only (which should prevent "a3" from finding "a2".

Edit: Here's the XPath I was writing, and what I intended it to do:

preceding-sibling::*[1][name()="a"]… - find the first preceding element, and ensure that it is an . This appears to be working as desired.

[following-sibling::node()[1][…]] - ensure that the first following node (of the found preceding ) matches some conditions

not(text()) or normalize-space(.)="" - ensure that this following node is either not a text node, or that the normalized space of it is empty

Dimitre Novatchev · Accepted Answer

Use:

/*/a/preceding-sibling::node()
       [not(self::text()[not(normalize-space())])]
            [1]
              [self::a]

XSLT - based verification:

When this transformation is applied on the provided XML document:

the XPath expression is evaluated and the nodes that are selected by this evaluation, are copied to the output:

a1 a3

Update:

What is wrong with the XPath expression in the question?

The problem is here:

[not(text()) or normalize-space(.)='']

This tests if the context node doesn't have a text node child.

But the OP wants to test if the context node is a text node.

Solution:

Replace the above with:

[not(self::text()) or normalize-space(.)='']

XSLT - based verification:

Now this transformation produces exactly the wanted result:

a1 a3

XPath to select preceding element with optional intervening whitespace-only text node

Answers (1)

Related Questions