N0000B
N0000B

Reputation: 449

XSLT to Remove Duplicate Nodes Based on Multiple Elements/Values

I have the following input XML snippet

   <Customer>
      <FirstName>ABC</FirstName>
      <LastName>XYZ</LastName>
      <Location>LOC_1</Location>
      <Location>LOC_2</Location>
   </Customer>
   <Customer>
      <FirstName>ABC</FirstName>
      <LastName>XYZ</LastName>
      <Location>LOC_1</Location>
   </Customer>
   <Customer>
      <FirstName>SOME_OTHER_CUSTOMER</FirstName>
      <LastName>XYZ</LastName>
      <Location>LOC_1</Location>
   </Customer>

I want to filter the second node from the above XML as it has the same FirstName (ABC), LastName (XYZ) and matches one of the Location (LOC_1) of the first node using XSLT 2.0.

Output XML

 <Customer>
      <FirstName>ABC</FirstName>
      <LastName>XYZ</LastName>
      <Location>LOC_1</Location>
      <Location>LOC_2</Location>
   </Customer>
   <Customer>
      <FirstName>SOME_OTHER_CUSTOMER</FirstName>
      <LastName>XYZ</LastName>
      <Location>LOC_1</Location>
   </Customer>

I have looked at some examples using preceding-sibling which use a single element value to verify if it's a duplicate or not but somehow I couldn't make it work for my scenarios where I am checking multiple/repeating fields to check for duplicates.

Any help/suggestions to implement this in XSLT are much appreciated!

Upvotes: 0

Views: 47

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167436

Your post mentions XSLT 2, as current versions of previously XSLT 2 processor like Saxon 8.9-9.7 nowadays (since 9.8) are XSLT 3 processors I first show an XSLT 3 solution

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="#all">
  
  <xsl:output indent="yes"/>

  <xsl:mode on-no-match="shallow-skip"/>

  <xsl:template match="Customers">
    <xsl:for-each-group select="Customer" composite="yes" group-by="LastName, FirstName">
      <xsl:copy>
        <xsl:copy-of select="FirstName, LastName"/>
        <xsl:for-each-group select="current-group()/Location" group-by=".">
          <xsl:copy-of select="."/>
        </xsl:for-each-group>
      </xsl:copy>
    </xsl:for-each-group>
  </xsl:template>
  
</xsl:stylesheet>

If you are stuck with an older version of a procesor that really only supports XSLT 2 but not 3 then remove the xsl:mode declaration and use the grouping as

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs">
  
  <xsl:output indent="yes"/>

  <xsl:template match="Customers">
    <xsl:for-each-group select="Customer" group-by="concat(LastName, '|',  FirstName)">
      <xsl:copy>
        <xsl:copy-of select="FirstName, LastName"/>
        <xsl:for-each-group select="current-group()/Location" group-by=".">
          <xsl:copy-of select="."/>
        </xsl:for-each-group>
      </xsl:copy>
    </xsl:for-each-group>
  </xsl:template>
  
</xsl:stylesheet>

I have assumed the Customer elements are in a common parent element Customers, depending on your input element structure you need to adjust that name in the match pattern to the real parent element name or, if the elements don't have a single common parent, use a wider selection like select="//Customer" for the outer for-each-group.

Upvotes: 1

Related Questions