Phrogz
Phrogz

Reputation: 303244

XPath to select preceding element with optional intervening whitespace-only text node

Given an element as context I want to select the preceding sibling element and check to see if it has a particular name. The caveat is that I do not want to select it if there is an intervening text node that has non-whitespace content.

For example, given this XML document…

<r>
  <a>a1</a><a>a2</a>
   b
  <a>a3</a>
    <a>a4</a>
  <b/>
  <a>a5</a>
</r>

…then:


I can check to see if the preceding sibling is an <a> with preceding-sibling::*[1][name()="a"]

However, I can't figure out how to say "select the following sibling node, regardless of element or textness, and see if that's not text or normalize-space(.)="". My best guess was this:

preceding-sibling::*[1][name()="a"][following-sibling::node()[1][not(text()) or normalize-space(.)=""]]

…but that appears to have no effect.


Here's my test Ruby file:

require 'nokogiri'

xpath = 'preceding-sibling::*[1][name()="a"][following-sibling::node()[1][not(text()) or normalize-space(.)=""]]'
fragment = Nokogiri::XML.fragment '<a>a1</a><a>a2</a> b <a>a3</a> <a>a4</a> <b/> <a>a5</a>'    

fragment.css('a').each{ |a| p [a.text,a.xpath(xpath).to_s] }
#=> ["a1", ""]
#=> ["a2", ""]
#=> ["a3", "<a>a2</a>"]
#=> ["a4", "<a>a3</a>"]
#=> ["a5", ""]

The result for "a2" and "a3" are what is wrong and confuses me. It finds the preceding <a> correctly, but then does not correctly verify that the first following-sibling of that is either not text (which should allow "a2" to find "a1") or that it is whitespace only (which should prevent "a3" from finding "a2".


Edit: Here's the XPath I was writing, and what I intended it to do:

Upvotes: 3

Views: 2479

Answers (1)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243449

Use:

/*/a/preceding-sibling::node()
       [not(self::text()[not(normalize-space())])]
            [1]
              [self::a]

XSLT - based verification:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
     <xsl:copy-of select=
       "/*/a
          /preceding-sibling::node()
                      [not(self::text()[not(normalize-space())])]
                                        [1]
                                         [self::a]
    "/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<r>
  <a>a1</a><a>a2</a>
   b
  <a>a3</a>
    <a>a4</a>
  <b/>
  <a>a5</a>
</r>

the XPath expression is evaluated and the nodes that are selected by this evaluation, are copied to the output:

<a>a1</a>
<a>a3</a>

Update:

What is wrong with the XPath expression in the question?

The problem is here:

[not(text()) or normalize-space(.)='']

This tests if the context node doesn't have a text node child.

But the OP wants to test if the context node is a text node.

Solution:

Replace the above with:

[not(self::text()) or normalize-space(.)='']

XSLT - based verification:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/*/a">
     <xsl:copy-of select=
     "preceding-sibling::*[1]
                      [name()='a']
                         [following-sibling::node()[1]
                                    [not(self::text()) or normalize-space(.)='']
                       ]"/>
 </xsl:template>
 <xsl:template match="text()"/>
</xsl:stylesheet>

Now this transformation produces exactly the wanted result:

<a>a1</a>
<a>a3</a>

Upvotes: 5

Related Questions