Allen
Allen

Reputation:

How to remove duplicate XML nodes using XSLT

I've got an extremely long XML file, like

<Root>
   <ele1>
      <child1>context1</child1>
      <child2>test1</child2>
      <child1>context1</child1>
   </ele1>

   <ele2>
      <child1>context2</child1>
      <child2>test2</child2>
      <child1>context2</child1>
   </ele2>
   <ele3>...........<elen>
</Root>

Now I want to remove all the second <child1> in each <ele> using xslt, is it possible? The result would be like this:

<Root>
   <ele1>
      <child1>context1</child1>
      <child2>test1</child2>
   </ele1>

   <ele2>
      <child1>context2</child1>
      <child2>test2</child2>
   </ele2>
       <ele3>...........<elen>
</Root>

Thank u, BR

Allen

Upvotes: 3

Views: 14631

Answers (3)

ABach
ABach

Reputation: 3738

If the OP's provided XML is representative of his/her question (and the 2nd <child1> inside each <ele*> element should be removed), then Muenchian Grouping isn't necessary:

XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output omit-xml-declaration="no" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <!-- Identity Template: copies everything as-is -->
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <!-- Remove the 2nd <child1> element from each <ele*> element -->
  <xsl:template match="*[starts-with(name(), 'ele')]/child1[2]" />

</xsl:stylesheet>

When run against the provided XML:

<?xml version="1.0" encoding="UTF-8"?>
<Root>
  <ele1>
    <child1>context1</child1>
    <child2>test1</child2>
    <child1>context1</child1>
  </ele1>
  <ele2>
    <child1>context2</child1>
    <child2>test2</child2>
    <child1>context2</child1>
  </ele2>
</Root>

...the desired result is produced:

<?xml version="1.0" encoding="UTF-8"?>
<Root>
  <ele1>
    <child1>context1</child1>
    <child2>test1</child2>
  </ele1>
  <ele2>
    <child1>context2</child1>
    <child2>test2</child2>
  </ele2>
</Root>

Upvotes: 2

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243449

This question requires a little bit more detailed answer than just pointing to a good Muenchian Grouping source.

The reason is that the needed grouping requires to identify both the names of all children of an "ele[SomeString]" element and their parent. Such grouping requires to define a key that is uniquely defined by both unique sources, usually via concatenation.

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:key name="kElByName" match="*"
      use="concat(generate-id(..), '+',name())"/>

    <xsl:template match="node()|@*">
      <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
    </xsl:template>

    <xsl:template match="*[starts-with(name(), 'ele')]">
      <xsl:copy>
        <xsl:copy-of select="@*"/>
        <xsl:apply-templates select=
         "*[generate-id()
           =
            generate-id(key('kElByName',
                        concat(generate-id(..), '+',name())
                        )[1])
            ]"
         />
      </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

when applied on this XML document:

<Root>
    <ele1>
        <child1>context1</child1>
        <child2>test1</child2>
        <child1>context1</child1>
    </ele1>
    <ele2>
        <child1>context2</child1>
        <child2>test2</child2>
        <child1>context2</child1>
    </ele2>
    <ele3>
        <child2>context2</child2>
        <child2>test2</child2>
        <child1>context1</child1>
    </ele3>
</Root>

produces the wanted result:

<Root>
    <ele1>
        <child1>context1</child1>
        <child2>test1</child2>
    </ele1>
    <ele2>
        <child1>context2</child1>
        <child2>test2</child2>
    </ele2>
    <ele3>
        <child2>context2</child2>
        <child1>context1</child1>
    </ele3>
</Root>

Upvotes: 4

annakata
annakata

Reputation: 75804

Your xml and question are kind of unclear, but what you're looking for is commonly called the Muenchian Grouping method - it's another way of asking for distinct nodes. With the appropriate keys this can be done very efficiently.

Upvotes: -1

Related Questions