Reputation: 303244
Given an element as context I want to select the preceding sibling element and check to see if it has a particular name. The caveat is that I do not want to select it if there is an intervening text node that has non-whitespace content.
For example, given this XML document…
<r>
<a>a1</a><a>a2</a>
b
<a>a3</a>
<a>a4</a>
<b/>
<a>a5</a>
</r>
…then:
<a>
sibling element immediately preceding it)<a>
).I can check to see if the preceding sibling is an <a>
with preceding-sibling::*[1][name()="a"]
However, I can't figure out how to say "select the following sibling node, regardless of element or textness, and see if that's not text or normalize-space(.)=""
. My best guess was this:
preceding-sibling::*[1][name()="a"][following-sibling::node()[1][not(text()) or normalize-space(.)=""]]
…but that appears to have no effect.
Here's my test Ruby file:
require 'nokogiri'
xpath = 'preceding-sibling::*[1][name()="a"][following-sibling::node()[1][not(text()) or normalize-space(.)=""]]'
fragment = Nokogiri::XML.fragment '<a>a1</a><a>a2</a> b <a>a3</a> <a>a4</a> <b/> <a>a5</a>'
fragment.css('a').each{ |a| p [a.text,a.xpath(xpath).to_s] }
#=> ["a1", ""]
#=> ["a2", ""]
#=> ["a3", "<a>a2</a>"]
#=> ["a4", "<a>a3</a>"]
#=> ["a5", ""]
The result for "a2" and "a3" are what is wrong and confuses me. It finds the preceding <a>
correctly, but then does not correctly verify that the first following-sibling of that is either not text (which should allow "a2" to find "a1") or that it is whitespace only (which should prevent "a3" from finding "a2".
Edit: Here's the XPath I was writing, and what I intended it to do:
preceding-sibling::*[1][name()="a"]…
- find the first preceding element, and ensure that it is an <a>
. This appears to be working as desired.
[following-sibling::node()[1][…]]
- ensure that the first following node (of the found preceding <a>
) matches some conditions
not(text()) or normalize-space(.)=""
- ensure that this following node is either not a text node, or that the normalized space of it is emptyUpvotes: 3
Views: 2479
Reputation: 243449
Use:
/*/a/preceding-sibling::node()
[not(self::text()[not(normalize-space())])]
[1]
[self::a]
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/a
/preceding-sibling::node()
[not(self::text()[not(normalize-space())])]
[1]
[self::a]
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<r>
<a>a1</a><a>a2</a>
b
<a>a3</a>
<a>a4</a>
<b/>
<a>a5</a>
</r>
the XPath expression is evaluated and the nodes that are selected by this evaluation, are copied to the output:
<a>a1</a>
<a>a3</a>
Update:
What is wrong with the XPath expression in the question?
The problem is here:
[not(text()) or normalize-space(.)='']
This tests if the context node doesn't have a text node child.
But the OP wants to test if the context node is a text node.
Solution:
Replace the above with:
[not(self::text()) or normalize-space(.)='']
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/*/a">
<xsl:copy-of select=
"preceding-sibling::*[1]
[name()='a']
[following-sibling::node()[1]
[not(self::text()) or normalize-space(.)='']
]"/>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
Now this transformation produces exactly the wanted result:
<a>a1</a>
<a>a3</a>
Upvotes: 5