smkndblvr
smkndblvr

Reputation: 91

Using XSLT to remove duplicate nodes with different values/attributes

I'm trying to perform XSL 1.0 transformations on a set of XML files exported and transformed elsewhere that have duplicate nodes - I'm able to remove identical duplicate nodes, but not those with different values/attributes in them. . What I'm trying to achieve is to retain only the second set of error nodes. Any help in understanding where I'm going wrong is appreciated!

A set of XML files have data like this:

<row xmlns="http://www.example.com/abc/xyz" xmlns:dg="http://www.example.com/abc/def" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <data>
    <status>Y</status>
    <product>48530</product>
    <id>12312343</id>
    <error xmlns="">true</error>
    <errorReason xmlns="">Detailed error message</errorReason>
    <error xmlns="">true</error>
    <errorReason xmlns="">Detailed error message</errorReason>
  </data>
</row>

When using the following XSL, the duplicates are removed:

<xsl:stylesheet version="1.0" exclude-result-prefixes="xsi d dg" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:d="http://www.example.com/abc/xyz" 
xmlns:dg="http://www.example.com/abc/def" >
<xsl:output omit-xml-declaration="yes" method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="comment()"/>

  <!-- Drill down into the export XML and extract only the main table row data -->
  <xsl:template match="d:row">
    <xsl:apply-templates select="d:data"/>
  </xsl:template>

  <xsl:template match="error[preceding::error]"/>
  <xsl:template match="errorReason[preceding::errorReason]"/>

</xsl:stylesheet>

However, when I try the same XSL for a set of XML files with data like this:

<row xmlns="http://www.example.com/abc/xyz" xmlns:dg="http://www.example.com/abc/def" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <data>
    <status>Y</status>
    <product>130160072014</product>
    <dob>11/11/1911</dob>
    <id>12312312</id>
    <error>false</error>
    <errorReason />
    <error xmlns="">true</error>
    <errorReason xmlns="">Detailed error message</errorReason>
  </data>
</row>

nothing's happening.

I suspect the empty xmlns maybe the cause, but am not too sure.

Upvotes: 0

Views: 849

Answers (1)

Tim C
Tim C

Reputation: 70598

This is because of namespaces. xmlns is a namespace declaration. In your first XML the error and errorreason elements all have xmlns="" declared which means they are all in no namespace.

However, in your second XML you do this:

<error>false</error>
<errorReason />
<error xmlns="">true</error>
<errorReason xmlns="">Detailed error message</errorReason>

The first error and errorReason are don't have an explicit xmlns on, which means they are in the default namespace which was defined on the row element

 <row xmlns="http://www.example.com/abc/xyz" 

The declaration applies to not just the row element, but its descendants as well, unless overridde.

This means the first error and errorReason are in a different namespace to the other two (which aren't actually in a namespace), and so they are effectively different. They are not matched by your XSLT template, as the template is only matching the elements in no namespace.

You haven't said which pair of elements you wish to retain. The ones in a namespace, or the ones without. However, if you did really want to remove "duplicates" regardless of namespaces, you could use these two templates, which just ignores the namespaces altogether (and so will retain the first elements, which are in the namespace in your case).

<xsl:template match="*[local-name() = 'error'][preceding::*[local-name() = 'error']]"/>
<xsl:template match="*[local-name() = 'errorReason'][preceding::*[local-name() = 'errorReason']]"/>

Upvotes: 1

Related Questions