John Smith
John Smith

Reputation: 13

Removing duplicate elements and collecting all sub nodes together with xslt

I am a newbee to the xslt and have an awkward problem which I spent many hours to solve but could not reach to a conclusion. Thanks in advance for any help.

I have an xml document like this:

<root>

<ELEMENT id="1" >

<CHILD name="aaa">
<EMPLOYEE>Mark</EMPLOYEE>
<EMPLOYEE>John</EMPLOYEE>
</CHILD>

<CHILD name="bbb">
<EMPLOYEE>Tom</EMPLOYEE>
</CHILD>

</ELEMENT>


<ELEMENT id="2" >

<CHILD name="aaa">
<EMPLOYEE>leo</EMPLOYEE>
<EMPLOYEE>Jason</EMPLOYEE>
</CHILD>


</ELEMENT>

<ELEMENT id="1" >

<CHILD name="aaa">
<EMPLOYEE>Tim</EMPLOYEE>
</CHILD>


</ELEMENT>

</root>

What I try to do is collecting "EMPLOYEE"s having same "ELEMENT" id and "CHILD" name together and removing duplicate "ELEMENT"s and "CHILD"s.

I mean I need only one having only one but with all employees belonging to id=1 name=aaa. Resulting file should look like:

<root>
<ELEMENT id="1" >

<CHILD name="aaa">
<EMPLOYEE>Mark</EMPLOYEE>
<EMPLOYEE>John</EMPLOYEE>
<EMPLOYEE>Tim</EMPLOYEE>
</CHILD>

<CHILD name="bbb">
<EMPLOYEE>Tom</EMPLOYEE>
</CHILD>

</ELEMENT>


<ELEMENT id="2" >

<CHILD name="aaa">
<EMPLOYEE>leo</EMPLOYEE>
<EMPLOYEE>Jason</EMPLOYEE>
</CHILD>


</ELEMENT>
</root>

How should my xslt code be? Do I need to iterate with a for each loop, or should I apply recursive templates?

Many tnx


Thank you very much for your useful answer. It is far beyond from the point I could reach out. However the code removes some of the CHILD nodes which it should not remove.

I tried for a more complex xml:

<ROOT>
  <ELEMENT id="1" >
    <CHILD name="aaa">
      <EMPLOYEE>
        asdf
      </EMPLOYEE>
      <EMPLOYEE>
       asdf
      </EMPLOYEE>
    </CHILD>
    <CHILD name="bbb">
      <EMPLOYEE>
       adsf
      </EMPLOYEE>
    </CHILD>
  </ELEMENT>
  <ELEMENT id="1" >
    <CHILD name="aaa">
          <EMPLOYEE>
       asdf
      </EMPLOYEE>
      <EMPLOYEE>
       asdf
      </EMPLOYEE>
    </CHILD>
    <CHILD name="ccc">
          <EMPLOYEE>
        asdf
      </EMPLOYEE>
    </CHILD>
  </ELEMENT>
  <ELEMENT id="2" >
    <CHILD name="ddd">
      <EMPLOYEE>
       asdf
      </EMPLOYEE>
    </CHILD>
    <CHILD name="eee">
      <EMPLOYEE>
       asdf
      </EMPLOYEE>
    </CHILD>
  </ELEMENT>
  <ELEMENT id="3" >
    <CHILD name="xxx">
      <EMPLOYEE>
       asdf
      </EMPLOYEE>
    </CHILD>
    <CHILD name="yyy">
      <EMPLOYEE>
       asdf
      </EMPLOYEE>
    </CHILD>
  </ELEMENT>
  <ELEMENT id="4" >
    <CHILD name="ddd">
      <EMPLOYEE>
        asdf
      </EMPLOYEE>

    </CHILD>
    <CHILD name="aaa">
      <EMPLOYEE>
       adsf
      </EMPLOYEE>

    </CHILD>
  </ELEMENT>
  <ELEMENT id="3" >
    <CHILD name="xxx">
      <EMPLOYEE>
       asdf
      </EMPLOYEE>
    </CHILD>
    <CHILD name="aaa">
      <EMPLOYEE>
       asdf
      </EMPLOYEE>
    </CHILD>
  </ELEMENT>
  <ELEMENT id="1" >
    <CHILD name="aaa">
      <EMPLOYEE>
    asdf
      </EMPLOYEE>

   </CHILD>
    <CHILD name="bbb">
      <EMPLOYEE>
        asdf
      </EMPLOYEE>
    </CHILD>
  </ELEMENT>
  <ELEMENT id="2" >
    <CHILD name="ddd">
      <EMPLOYEE>
        asdf
      </EMPLOYEE>
      <EMPLOYEE>
       asdf
      </EMPLOYEE>

    </CHILD>
    <CHILD name="aaa">
      <EMPLOYEE>
     asdf
      </EMPLOYEE>
    </CHILD>
  </ELEMENT>

</ROOT>

Upvotes: 1

Views: 693

Answers (1)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243469

This transformation:

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:key name="kElemById" match="ELEMENT" use="@id"/>
     <xsl:key name="kChildByNameAndParentId"
      match="CHILD" use="concat(../@id, '+', @name)"/>
     <xsl:key name="kEmplByAnc" match="EMPLOYEE"
      use="concat(../../@id, '+', ../@name)"/>

     <xsl:template match="node()|@*">
      <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
     </xsl:template>

     <xsl:template match=
      "ELEMENT
         [generate-id()
         =
          generate-id(key('kElemById', @id)[1])
         ]
      ">
       <xsl:copy>
         <xsl:apply-templates select="@*"/>
         <xsl:apply-templates select=
         "../ELEMENT/CHILD
           [generate-id()
           =
            generate-id(key('kChildByNameAndParentId',
                            concat(current()/@id,
                                  '+',
                                  @name
                                  )
                            )[1]
                        )
           ]
         "/>
       </xsl:copy>
     </xsl:template>

     <xsl:template match="CHILD">
       <xsl:copy>
         <xsl:apply-templates select=
         "@*
         |
          key('kEmplByAnc', concat(../@id, '+', @name))"/>
       </xsl:copy>
     </xsl:template>

     <xsl:template match="ELEMENT"/>
</xsl:stylesheet>

when applied on the provided XML document:

<root>
    <ELEMENT id="1" >
        <CHILD name="aaa">
            <EMPLOYEE>Mark</EMPLOYEE>
            <EMPLOYEE>John</EMPLOYEE>
        </CHILD>
        <CHILD name="bbb">
            <EMPLOYEE>Tom</EMPLOYEE>
        </CHILD>
    </ELEMENT>
    <ELEMENT id="2" >
        <CHILD name="aaa">
            <EMPLOYEE>leo</EMPLOYEE>
            <EMPLOYEE>Jason</EMPLOYEE>
        </CHILD>
    </ELEMENT>
    <ELEMENT id="1" >
        <CHILD name="aaa">
            <EMPLOYEE>Tim</EMPLOYEE>
        </CHILD>
    </ELEMENT>
</root>

produces the wanted, correct result:

<root>
   <ELEMENT id="1">
      <CHILD name="aaa">
         <EMPLOYEE>Mark</EMPLOYEE>
         <EMPLOYEE>John</EMPLOYEE>
         <EMPLOYEE>Tim</EMPLOYEE>
      </CHILD>
      <CHILD name="bbb">
         <EMPLOYEE>Tom</EMPLOYEE>
      </CHILD>
   </ELEMENT>
   <ELEMENT id="2">
      <CHILD name="aaa">
         <EMPLOYEE>leo</EMPLOYEE>
         <EMPLOYEE>Jason</EMPLOYEE>
      </CHILD>
   </ELEMENT>
</root>

Explanation: Appropriate use of the Muenchian method for grouping and keys.

Upvotes: 1

Related Questions