Sunny
Sunny

Reputation: 47

Compare and remove duplicates from XML using XSLT

I've following XML document:

<root>
<Organization>
    <Organization_ID >111111</Organization_ID>
    <Organization_Code>ABC</Organization_Code>
</Organization>
<Organization>
    <Organization_ID >111111</Organization_ID>
    <Organization_Code>ABC</Organization_Code>
</Organization>
<Organization>
    <Organization_ID >111111</Organization_ID>
    <Organization_Code>ABCD</Organization_Code>
    <Organization_Type>Test</Organization_Type>
</Organization>

</root>

I need the output as(removing duplicate records):

<root>

<Organization>
    <Organization_ID>111111</Organization_ID>
    <Organization_Code>ABC</Organization_Code>
</Organization>
<Organization>
    <Organization_ID>111111</Organization_ID>
    <Organization_Code>ABCD</Organization_Code>
    <Organization_Type>Test</Organization_Type>
</Organization>

</root>

I already wrote a code below which can do this. My issues is that we need to compare all the child elements as see if they are exact duplicates. As soon as I put condition for Organization_Type, output picks all three records

My Code:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>

<xsl:template match="@* | node()">
    <xsl:copy>
        <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="Organization">
    <xsl:if
        test="
            (not(following::Organization[Organization_ID = current()/Organization_ID])
            or not(following::Organization[Organization_Code = current()/Organization_Code])


            )">
        <xsl:copy>

            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:if>
</xsl:template>
</xsl:stylesheet>

Code which I want to use but isn't working:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>

<xsl:template match="@* | node()">
    <xsl:copy>
        <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="Organization">
    <xsl:if
        test="
            (not(following::Organization[Organization_ID = current()/Organization_ID])
            or not(following::Organization[Organization_Code = current()/Organization_Code])
            or not(following::Organization[Organization_Type = current()/Organization_Type])

            )">
        <xsl:copy>

            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:if>
</xsl:template>
</xsl:stylesheet>

Any help will be appreciated. Sorry this is my first post so might not be posting at correct place or in correct format.

Upvotes: 0

Views: 3423

Answers (1)

Tim C
Tim C

Reputation: 70648

Your stylesheet shows version 2.0, so assuming you are indeed using an XSLT 2.0 process, you can use xsl:for-each-group here. Effectively you group by a concatenation of Organization_ID, Organization_Code and Organization_Type but output only the first element in each group, thus removing duplicates.

Try this XSLT

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output method="xml" indent="yes" />

    <xsl:template match="root">
      <xsl:copy>
          <xsl:for-each-group select="Organization" group-by="concat(Organization_ID, '|', Organization_Code, '|', Organization_Type)">
              <xsl:apply-templates select="." />
          </xsl:for-each-group>
      </xsl:copy>
    </xsl:template>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Upvotes: 1

Related Questions