Mike
Mike

Reputation: 33

How to remove duplicates based on level in hierarchy?

I have the following XML structure:

<node name="A">
  <node name="B">
    <node name="C"/>
    <node name="D"/>
    <node name="E"/>
  </node>
  <node name="D"/>
  <node name="E"/>
</node>

I need to get all the leaf nodes. I use //node[not(node)] to get those. Now I need to remove duplicates by leaving elements that are deeper in hierarchy. How do I do that?

Upvotes: 3

Views: 558

Answers (1)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243479

This transformation:

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:variable name="vallLeaves" select="//node()[not(node())]"/>

 <xsl:template match="/">
$vallLeaves:
     <xsl:copy-of select="$vallLeaves"/>

$vallDistinctLeaves:    
     <xsl:for-each select="$vallLeaves">
       <xsl:if test=
       "generate-id()
        =
         generate-id($vallLeaves[@name
                                =
                                 current()/@name
                               ]
                                  [1]
                   )
     ">
         <xsl:copy-of select="."/>
       </xsl:if>
     </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<node name="A">
  <node name="B">
    <node name="C"/>
    <node name="D"/>
    <node name="E"/>
  </node>
  <node name="D"/>
  <node name="E"/>
</node>

produces the wanted, correct result:

$vallLeaves:
     <node name="C"/>
<node name="D"/>
<node name="E"/>
<node name="D"/>
<node name="E"/>

$vallDistinctLeaves:    
     <node name="C"/>
<node name="D"/>
<node name="E"/>

II. XSLT 2.0 Solution:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:variable name="vallLeaves" select="//node()[not(node())]"/>
    <xsl:variable name="vallDistinctLeaves" as="element()*">
      <xsl:for-each-group select="$vallLeaves" group-by="@name">
       <xsl:sequence select="."/>
      </xsl:for-each-group>
    </xsl:variable>

 <xsl:template match="/">
$vallLeaves:
     <xsl:sequence select="$vallLeaves"/>

$vallDistinctLeaves:    
     <xsl:sequence select="$vallDistinctLeaves"/>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the same XML document (above), the same correct results are produced:

$vallLeaves:
     <node name="C"/>
<node name="D"/>
<node name="E"/>
<node name="D"/>
<node name="E"/>

$vallDistinctLeaves:    
     <node name="C"/>
<node name="D"/>
<node name="E"/>

Upvotes: 1

Related Questions