Munaaf Ghumran
Munaaf Ghumran

Reputation: 25

XSLT2.0 Remove duplicate values in node

I have been trying to figure out how to remove elements with duplicate values from an XML document using XSLT.

For example: Input:

<main>
   <h1>
      <node1>duplicate</node1>
      <node1>duplicate</node1>
      <node1>New data</node1>
      <node1>New data 2</node1>
      <node1>duplicate</node1>
   </h1>
</main>

expected output:

<main>
   <h1>
      <node1>duplicate</node1>
      <node1>New data</node1>
      <node1>New data 2</node1>
   </h1>
</main>

I'm sure this must not be too complicated but I am failing to understand any methods I have seen so far. Thanks!

Thanks to Michael below! I have a further question, if the above example had more nodes (which would never be duplicate), for example

 <main>
   <h1>
      <node1>duplicate</node1>
      <node1>duplicate</node1>
      <node1>New data</node1>
      <node1>New data 2</node1>
      <node1>duplicate</node1>
      <node2> Data </node2>

   </h1>
</main>

How would I bring this data through in the XSLT code? The below solution removes any additional data I have found despite my understanding of the identity transform to copy all, and the match to modify only matching templates.

Upvotes: 1

Views: 157

Answers (2)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243579

Do note that the currently accepted answer is incorrect!

Even the 3rd solution, which doesn't lose elements is incorrect, because it doesn't preserve the order of the elements.

Given this XML document:

 <main>
   <h1>
      <node1>duplicate</node1>
      <node2> Data </node2>
      <node1>duplicate</node1>
      <node1>New data</node1>
      <node1>New data 2</node1>
      <node1>duplicate</node1>
   </h1>
</main>

the last (3rd) transformation in the accepted answer:

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="h1">
    <xsl:copy>
        <xsl:for-each-group select="node1" group-by=".">
            <node1>
                <xsl:value-of select="."/>
            </node1>
        </xsl:for-each-group>
        <xsl:apply-templates select="* except node1"/>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

produces result where <node2> is after all <node1> elements -- clearly not what one expects from an "identity" that loses duplicates:

<?xml version="1.0" encoding="UTF-8"?>
<main>
   <h1>
      <node1>duplicate</node1>
      <node1>New data</node1>
      <node1>New data 2</node1>
      <node2> Data </node2>
   </h1>
</main>

Now a correct and very short solution :)

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <xsl:key name="kNode1ByVal" match="h1/node1" use="."/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="h1/node1[not(. is key('kNode1ByVal',.)[1])]"/>
</xsl:stylesheet>

This produces the expected, correct results -- even when applied on the above XML document -- do note that the order of the <node1> and <node2> elements is preserved!:

<main>
   <h1>
      <node1>duplicate</node1>
      <node2> Data </node2>
      <node1>New data</node1>
      <node1>New data 2</node1>
   </h1>
</main>

Upvotes: 3

michael.hor257k
michael.hor257k

Reputation: 117140

Here's one way:

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="h1">
    <xsl:copy>
        <xsl:for-each-group select="node1" group-by=".">
            <node1>
                <xsl:value-of select="."/>
            </node1>
        </xsl:for-each-group>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Here's another:

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="h1">
    <xsl:copy>
        <xsl:for-each select="distinct-values(node1)">
            <node1>
                <xsl:value-of select="."/>
            </node1>
        </xsl:for-each>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Added:

To process other nodes under the h1 header, add the following instruction:

<xsl:apply-templates select="* except node1"/>

For example (in the first case):

<xsl:template match="h1">
    <xsl:copy>
        <xsl:for-each-group select="node1" group-by=".">
            <node1>
                <xsl:value-of select="."/>
            </node1>
        </xsl:for-each-group>
        <xsl:apply-templates select="* except node1"/>
    </xsl:copy>
</xsl:template>

Upvotes: 2

Related Questions