Jerome Hillman
Jerome Hillman

Reputation: 13

XSLT to remove duplicates completely

I know there are solutions plenty to remove duplicates, but this one is slightly different. I need to remove the element from the output if it is a duplicate. Input:

<SanctionList>
    <row>
        <PersonId>1000628</PersonId>
        <PersonId>1000634</PersonId>
        <PersonId>1113918</PersonId>
        <PersonId>1133507</PersonId>
        <PersonId>1113918</PersonId>
    </row>
</SanctionList>

Output expected:

<SanctionList>
    <row>
        <PersonId>1000628</PersonId>
        <PersonId>1000634</PersonId>
        <PersonId>1133507</PersonId>
    </row>
</SanctionList>

Here is what I tried but the parser returns 1 for each of the groups. Shouldnt it return 2 for PersonId 1113918 since it appears twice in the list?

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"  xmlns:xs="http://www.w3.org/2001/XMLSchema"    version="2.0">
    <xsl:strip-space elements="*"/>

    <xsl:template match="node()|@*">
        <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>    
    </xsl:template>

    <xsl:template match="SanctionList">
        <xsl:for-each-group select="row" group-by="PersonId">
            <xsl:text> Count for </xsl:text>
            <xsl:value-of select="current-grouping-key()" />
                <xsl:text> is </xsl:text>
            <xsl:value-of select="count(current-group())" />
        </xsl:for-each-group>
    </xsl:template> 
</xsl:stylesheet>

Thanks kindly!

Upvotes: 1

Views: 223

Answers (1)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243469

I know there are solutions plenty to remove duplicates, but this one is slightly different. I need to remove the element from the output if it is a duplicate

Use this short and simple transformation (both in XSLT 2.0 and XSLT 1.0):

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <xsl:key name="kPersonByVal" match="PersonId" use="."/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="PersonId[key('kPersonByVal', .)[2]]"/>
</xsl:stylesheet>

when the transformation is applied on the provided XML document:

<SanctionList>
    <row>
        <PersonId>1000628</PersonId>
        <PersonId>1000634</PersonId>
        <PersonId>1113918</PersonId>
        <PersonId>1133507</PersonId>
        <PersonId>1113918</PersonId>
    </row>
</SanctionList>

the wanted, correct result is produced:

<SanctionList>
   <row>
      <PersonId>1000628</PersonId>
      <PersonId>1000634</PersonId>
      <PersonId>1133507</PersonId>
   </row>
</SanctionList>

Explanation:

  1. A wellknown design pattern for copying an existing XML document and deleting/replacing/inserting some nodes into the copy, is by overriding the identity rule.
  2. In this particular case the task is to delete <PersonId> elements. This is done by providing a matching template with no (empty) body.
  3. The criterion for deletion is that the element must have a duplicate -- that is, at least two <PersonId> elements must exist, having the same string value. This is most conveniently done using an <xsl:key> declaration and the key() function to get all elements with the same string value.
  4. Finally, in the match pattern of the empty (deleting) template we check if the node-set of equally-valued elements has a second element.

Note: You can learn more about the <xsl:key> declaration and the key() function in module 9 of my Pluralsight training course "XSLT 2.0 and 1.0 foundations"

Upvotes: 1

Related Questions