markmnl
markmnl

Reputation: 11426

XSLT, XPath unique child nodes only problem where non unique nodes are not selected at all

<root>
  <parent>
    <child>
     <name>John</name>
    </child>
    <child>
      <name>Ben</name>
    </child>
  </parent>
  <parent>
    <child>
     <name>John</name>
    </child>
    <child>
     <name>Mark</name>
    </child>
    <child>
      <name>Luke</name>
    </child>
 </parent>
</root>

I want unique child nodes only i.e. only one child node if there is more than one with the same name.

Such as:

John Ben Mark Luke

I have tried:

<xsl:for-each select="parent">
  <xsl:for-each select="child[name != preceding::name]">
    <xsl:value-of select="name"/>
  </xsl:for-each>
</xsl:for-each>

But I get:

Ben Mark Luke

?!

Upvotes: 2

Views: 1341

Answers (1)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243449

Your problem is that you are using the != operator for comparison between a value and a node-set.

This is wrong -- always avoid using the != operator and always use the not() function and the = operator when one of the operands in the comparison is a node-set.

Below is a correct solution:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/*">
    <xsl:for-each select="parent">
      <xsl:for-each select="child[not(name = preceding::name)]">
        <xsl:value-of select="concat(name, ' ')"/>
      </xsl:for-each>
    </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the provided XML document:

<root>
  <parent>
    <child>
     <name>John</name>
    </child>
    <child>
      <name>Ben</name>
    </child>
  </parent>
  <parent>
    <child>
     <name>John</name>
    </child>
    <child>
     <name>Mark</name>
    </child>
    <child>
      <name>Luke</name>
    </child>
 </parent>
</root>

the wanted, correct result is produced:

John Ben Mark Luke 

Explanation: Here is how the W3C XPath 1.0 spec defines the semantics of the != operator:

"If one object to be compared is a node-set and the other is a string, then the comparison will be true if and only if there is a node in the node-set such that the result of performing the comparison on the string-value of the node and the other string is true."

This means that

's' != node-set

is always true if there is even only one node in node-set that isn't equal to 's'.

This isn't the semantics that is wanted.

On the other side,

not('s' = node-set()) 

is true only only if there isn't a node in node-set that is equal to 's'.

This is exactly the wanted comparison.

Do note: The grouping technique you have chosen is O(N^2) and should only be used on very small sets of values to be dedupped. If efficiency is needed, by all means use the Muenchian method for grouping (discussing or demo-ing this falls outside the scope of this question).

Upvotes: 7

Related Questions