jmac
jmac

Reputation: 255

Removing Duplicate Elements from my XML file by using an XSLT

Here is an example where I'd like to remove a duplicate entry if the ID is the same. I'm pulling hits from system 'A' and system 'B'. I want system 'A' to have precedence (i.e., if the ID is a duplicate, remove the element from system 'B'). Here's my example:

I am getting this result:

<HitList>
   <Hit System="A" ID="1"/>
   <Hit System="A" ID="2"/>
   <Hit System="A" ID="2"/>
   <Hit System="B" ID="1"/>
   <Hit System="B" ID="2"/>
   <Hit System="B" ID="3"/>
   <Hit System="B" ID="4"/>
</HitList>

I want this result (with the duplicates removed):

<HitList>
   <Hit System="A" ID="1"/>
   <Hit System="A" ID="2"/>
   <Hit System="B" ID="3"/>
   <Hit System="B" ID="4"/>
</HitList>

Current Code:

        <xsl:template match="/RetrievePersonSearchDataRequest">
                    <HitList>
                                <xsl:if test="string(RetrievePersonSearchDataRequest/SystemA/NamecheckResponse/@Status) = string(Succeeded)">
                                            <xsl:for-each select="SystemA/NamecheckResponse/BATCH/ITEMLIST/ITEM/VISQST/NCHITLIST/NCHIT">
                                                        <Hit>
                                                                    <xsl:attribute name="System"><xsl:text>A</xsl:text></xsl:attribute>
                                                                    <xsl:attribute name="PersonID"><xsl:value-of select="number(
                                                        REFUSAL/@UID)"/></xsl:attribute>
                                                        </Hit>
                                            </xsl:for-each>
                                </xsl:if>
                                <xsl:if test="string(RetrievePersonSearchDataRequest/SystemB/NamecheckResponse/@Status) = string(Succeeded)">
                                            <xsl:for-each select="SystemB/NamecheckResponse/PersonIDSearchResponse/personID">
                                                        <Hit>
                                                                    <xsl:attribute name="System"><xsl:text>B</xsl:text></xsl:attribute>
                                                                    <xsl:attribute name="PersonID"><xsl:value-of select="number(.)"/></xsl:attribute>
                                                        </Hit>
                                            </xsl:for-each>
                                </xsl:if>
                    </HitList>
        </xsl:template>

Upvotes: 4

Views: 4566

Answers (3)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243579

Here is an efficient XSLT 1.0 solution using keys:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kHitById" match="Hit" use="@ID"/>
 <xsl:key name="kHitAById" match="Hit[@System = 'A']" use="@ID"/>

 <xsl:template match=
  "Hit[generate-id() = generate-id(key('kHitById',@ID)[1])]">

  <xsl:copy-of select=
  "key('kHitAById', @ID)[1]|current()[not(key('kHitAById', @ID))]"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the following XML document (intentionally adapted from the provided one, to make it more interesting by placing some Bs before the corresponding As):

<HitList>
   <Hit System="B" ID="1"/>
   <Hit System="A" ID="1"/>
   <Hit System="B" ID="2"/>
   <Hit System="A" ID="2"/>
   <Hit System="A" ID="2"/>
   <Hit System="B" ID="3"/>
   <Hit System="B" ID="4"/>
</HitList>

the wanted, correct result is produced:

<Hit System="A" ID="1"/>
<Hit System="A" ID="2"/>
<Hit System="B" ID="3"/>
<Hit System="B" ID="4"/>

Upvotes: 3

Michael Kay
Michael Kay

Reputation: 163595

XSLT 2.0 solution:

<xsl:template match="HitList">
<HitList>
  <xsl:for-each-group select="*" group-by="@ID">
    <xsl:copy-of select="current-group()[1]"/>
  </xsl:for-each-group>
</HitList>
</xsl:template>

This assumes the As will always precede the Bs. If that's not the case you could replace the inner instruction with

<xsl:copy-of select="(current-group()[@System='A'], current-group[@System='B'])[1]"/>

Upvotes: 3

Daniel Haley
Daniel Haley

Reputation: 52888

This can be done with a single override of the identity template...

XML Input

<HitList>
    <Hit System="A" ID="1"/>
    <Hit System="A" ID="2"/>
    <Hit System="A" ID="2"/>
    <Hit System="B" ID="1"/>
    <Hit System="B" ID="2"/>
    <Hit System="B" ID="3"/>
    <Hit System="B" ID="4"/>
</HitList>

XSLT 1.0

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="Hit[(@System='B' and @ID=../Hit[@System='A']/@ID) or 
        @ID = preceding-sibling::Hit[@System='A']/@ID]"/>

</xsl:stylesheet>

Output

<HitList>
   <Hit System="A" ID="1"/>
   <Hit System="A" ID="2"/>
   <Hit System="B" ID="3"/>
   <Hit System="B" ID="4"/>
</HitList>

Upvotes: 3

Related Questions