kumar
kumar

Reputation: 389

Merge similar nodes and remove duplicates

I am looking for a way to:

  1. merge nodes whose names or values are similar.
  2. after merging,delete the duplicate attributes of the node.
  3. If two attributes having different values, value of the first attribute has to replace with the value of the second attribute that got merged.

Here is a sample code:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<books>
    <mbean code="abc.def.ghi" name="com.booking.props:name=abcdefgh"> 
        <attribute name="BName">abc123</attribute> 
        <depends optional-attribute-name="Bookname">mypersonnelbook</depends> 
        <attribute name="Type1">book.type1.name</attribute> 
        <attribute name="Properties"> 
            bookname=book1
            price=100 
        </attribute> 
    </mbean>
    <mbean code="abc.def.ghi" name="com.booking.props:name=abcdefgh"> 
        <attribute name="BName">abc123</attribute> 
        <depends optional-attribute-name="Bookname">mypersonnelbook</depends> 
        <attribute name="Type2">book.type2.name</attribute>         
        <attribute name="Properties"> 
            bookname=book1
            price=100 
        </attribute> 
    </mbean>
    <us-country-factory>
        <jndi-name>books/props/Classic</jndi-name> 
        <file-name>book1</file-name> 
        <state-location>central.wharehouse</state-location> 
        <store-property name="store" type="java.lang.String">abc</store-property> 
        <store-property name="storetype" type="java.lang.String">1223</store-property> 
        <store-property name="storelocation" type="java.lang.String">defsdgfd</store-property> 
        <store-property name="storecategory" type="java.lang.String">hjtbngb</store-property> 
    </us-country-factory>
    <us-country-factory>
        <jndi-name>books/props/Classic</jndi-name> 
        <file-name>book1</file-name> 
        <state-location>central.wharehouse</state-location> 
        <store-property name="store" type="java.lang.String">defghij</store-property> 
        <store-property name="storetype" type="java.lang.String">1223</store-property> 
        <store-property name="storelocation" type="java.lang.String">32das</store-property> 
        <store-property name="storecategory" type="java.lang.String">hjtbngb</store-property> 
        <store-property name="storeratings" type="java.lang.String">5</store-property> 

Output I am looking is:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<books>
    <mbean code="abc.def.ghi" name="com.booking.props:name=abcdefgh"> 
        <attribute name="BName">abc123</attribute> 
        <depends optional-attribute-name="Bookname">mypersonnelbook</depends> 
        <attribute name="Type1">book.type1.name</attribute> 
        <attribute name="Type2">book.type2.name</attribute> 
        <attribute name="Properties"> 
            bookname=book1
            price=100 
        </attribute> 
    </mbean>
    <us-country-factory>
        <jndi-name>books/props/Classic</jndi-name> 
        <file-name>book1</file-name> 
        <state-location>central.wharehouse</state-location> 
        <store-property name="store" type="java.lang.String">defghij</store-property>  
        <store-property name="storetype" type="java.lang.String">1223</store-property> 
        <store-property name="storelocation" type="java.lang.String">32das</store-property> 
        <store-property name="storecategory" type="java.lang.String">hjtbngb</store-property> 
        <store-property name="storeratings" type="java.lang.String">5</store-property> 
    </us-country-factory>   
</books>

        </us-country-factory>   
    </books>

This is the xsl file that i have tried:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>
    <xsl:key name="mbeanName" match="//mbean/@name" use="."/>
    <xsl:key name="mbeanCount" match="//mbean[generate-id(@name) = generate-id(key('mbeanName', @name)[1])]" use="count(.)"/>
    <xsl:key name="us-country-factoryName" match="//us-country-factory[jndi-name/text()]" use="."/>
    <xsl:key name="us-country-factoryCount" match="/us-country-factory[generate-id(jndi-name/text()) = generate-id(key('us-country-factoryName', jndi-name/text())[1])]" use="count(.)"/>

 <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="mbean[count(. | key('mbeanCount', /mbean/@name))]" />
    <xsl:template match="mbean[count(. | key('us-country-factoryCount', us-country-factory[jndi-name/text()]))]" />
</xsl:stylesheet>

Upvotes: 1

Views: 1109

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167471

As you have tagged the question as XSLT 2.0 I would suggest to try to use for-each-group instead of keys. Here is a sample stylesheet:

<xsl:stylesheet
    version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" indent="yes"/> 

<xsl:template match="books">
  <xsl:copy>
    <xsl:for-each-group select="mbean" group-by="@name">
      <xsl:copy>
        <xsl:copy-of select="@*"/>
        <xsl:for-each-group select="current-group()/*" group-by="node-name(.)">
          <xsl:for-each-group select="current-group()" group-by="@*">
            <xsl:copy>
              <xsl:copy-of select="@*, current-group()[last()]/node()"/>
            </xsl:copy>
          </xsl:for-each-group>
        </xsl:for-each-group>
      </xsl:copy>
    </xsl:for-each-group>
    <xsl:for-each-group select="us-country-factory" group-by="jndi-name">
      <xsl:copy>
        <xsl:copy-of select="@*, *[not(@*)]"/>
        <xsl:for-each-group select="current-group()/*[@*]" group-by="string-join(@*, '|')">
          <xsl:copy>
            <xsl:copy-of select="@*, current-group()[last()]/node()"/>
          </xsl:copy>
        </xsl:for-each-group>
      </xsl:copy>
    </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

That transforms

<books>
    <mbean code="abc.def.ghi" name="com.booking.props:name=abcdefgh"> 
        <attribute name="BName">abc123</attribute> 
        <depends optional-attribute-name="Bookname">mypersonnelbook</depends> 
        <attribute name="Type1">book.type1.name</attribute> 
        <attribute name="Properties"> 
            bookname=book1
            price=100 
        </attribute> 
    </mbean>
    <mbean code="abc.def.ghi" name="com.booking.props:name=abcdefgh"> 
        <attribute name="BName">abc123</attribute> 
        <depends optional-attribute-name="Bookname">mypersonnelbook</depends> 
        <attribute name="Type2">book.type2.name</attribute>         
        <attribute name="Properties"> 
            bookname=book1
            price=100 
        </attribute> 
    </mbean>
    <us-country-factory>
        <jndi-name>books/props/Classic</jndi-name> 
        <file-name>book1</file-name> 
        <state-location>central.wharehouse</state-location> 
        <store-property name="store" type="java.lang.String">abc</store-property> 
        <store-property name="storetype" type="java.lang.String">1223</store-property> 
        <store-property name="storelocation" type="java.lang.String">defsdgfd</store-property> 
        <store-property name="storecategory" type="java.lang.String">hjtbngb</store-property> 
    </us-country-factory>
    <us-country-factory>
        <jndi-name>books/props/Classic</jndi-name> 
        <file-name>book1</file-name> 
        <state-location>central.wharehouse</state-location> 
        <store-property name="store" type="java.lang.String">defghij</store-property> 
        <store-property name="storetype" type="java.lang.String">1223</store-property> 
        <store-property name="storelocation" type="java.lang.String">32das</store-property> 
        <store-property name="storecategory" type="java.lang.String">hjtbngb</store-property> 
        <store-property name="storeratings" type="java.lang.String">5</store-property> 
    </us-country-factory>
</books>

into

<books>
   <mbean code="abc.def.ghi" name="com.booking.props:name=abcdefgh">
      <attribute name="BName">abc123</attribute>
      <attribute name="Type1">book.type1.name</attribute>
      <attribute name="Properties">
            bookname=book1
            price=100
        </attribute>
      <attribute name="Type2">book.type2.name</attribute>
      <depends optional-attribute-name="Bookname">mypersonnelbook</depends>
   </mbean>
   <us-country-factory>
      <jndi-name>books/props/Classic</jndi-name>
      <file-name>book1</file-name>
      <state-location>central.wharehouse</state-location>
      <store-property name="store" type="java.lang.String">defghij</store-property>
      <store-property name="storetype" type="java.lang.String">1223</store-property>
      <store-property name="storelocation" type="java.lang.String">32das</store-property>
      <store-property name="storecategory" type="java.lang.String">hjtbngb</store-property>
      <store-property name="storeratings" type="java.lang.String">5</store-property>
   </us-country-factory>
</books>

I realize this is not a complete solution but it should give you an idea on how you might approach the problem using XSLT 2.0. You will just have define precisely what "merge nodes whose names or values are similar" means for your different element types and then implement that using for-each-group.

Here is some attempt at an explanation, the code

<xsl:for-each-group select="mbean" group-by="@name">
  <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:for-each-group select="current-group()/*" group-by="node-name(.)">
      <xsl:for-each-group select="current-group()" group-by="@*">
        <xsl:copy>
          <xsl:copy-of select="@*, current-group()[last()]/node()"/>
        </xsl:copy>
      </xsl:for-each-group>
    </xsl:for-each-group>
  </xsl:copy>
</xsl:for-each-group>

groups the mbean elements by their name attributes, then makes a shallow copy of the first element in the group (as you want to eliminate duplicates), then copies any attributes. Furthermore an inner for-each-group groups all child elements in the mbean group by their name (so we group all attribute elements and we group all depends elements) and then by their attribute values. That is a simplification as the input has only one attribute for those elements. So inside the inner for-each-group we now have groups of e.g. attribute name="BName" elements. We make a shallow copy of the first element in the group (the context node in the group), then copy its attributes but copy the contents of the last item in the group.

Upvotes: 1

Related Questions