Reputation: 175
Input file:
<myroot>
<nodeA id="a">
<section id="i">
<item id="0" method="a"> <!-- parent section id="i" , keep this node-->
<somechild>a</somechild>
</item>
<item id="1" method="a">
<otherchild>a</otherchild>
</item>
</section>
<cell id="i">
<part id="1" method="b"> <!-- parent cell id="i", keep this node-->
<attr>u</attr>
</part>
</cell>
<section id="i">
<item id="0" method="a"> <!-- parent section id="i", remove this node-->
<type>blah</type>
</item>
<item id="3" method="a">
<other>xx</other>
</item>
<item id="0" method="b"> <!-- this has same id but different method, so we keep this -->
<otherchild>a</otherchild>
</item>
</section>
<cell id="i">
<part id="1" method="b"> <!-- parent cell id="i", remove this node -->
<attr>y</attr>
</part>
</cell>
</nodeA>
<nodeA id="b">
<section id="i">
<item id="1" method="a">
<otherchild>a</otherchild>
</item>
</section>
<section id="i">
<item id="0" method="a">
<type>blah</type>
</item>
<item id="1" method="a">
<other>xx</other>
</item>
</section>
</nodeA>
<nodeB id="a">
<cell id="i">
<part id="1" method="b">
<attr>u</attr>
</part>
</cell>
<section id="i">
<item id="0" method="a">
<type>blah</type>
</item>
</section>
<cell id="i">
<part id="1" method="b">
<attr>y</attr>
</part>
</cell>
</nodeB>
</myroot>
output:
<myroot>
<nodeA id="a">
<section id="i">
<item id="0" method="a">
<somechild>a</somechild>
</item>
<item id="1" method="a">
<otherchild>a</otherchild>
</item>
</section>
<cell id="i">
<part id="1" method="b">
<attr>u</attr>
</part>
</cell>
<section id="i">
<item id="3" method="a">
<other>xx</other>
</item>
<item id="0" method="b"> <!-- this has same id but different method, so we keep this -->
<otherchild>a</otherchild>
</item>
</section>
</nodeA>
<nodeA id="b">
<section id="i">
<item id="1" method="a">
<otherchild>a</otherchild>
</item>
</section>
<section id="i">
<item id="0" method="a">
<type>blah</type>
</item>
</section>
</nodeA>
<nodeB id="a">
<cell id="i">
<part id="1" method="b">
<attr>u</attr>
</part>
</cell>
<section id="i">
<item id="0" method="a">
<type>blah</type>
</item>
</section>
</nodeB>
</myroot>
Can anyone help me with the transformation, so that if one node occur two or more times and have the same parent id, we only keep the first occurrence and disregard the others.
Also there is another element in the file namely <nodeB></nodeB>
, <nodeC></nodeC>
. etc.
Thanks very much.
John
Upvotes: 0
Views: 1070
Reputation: 70618
I think you need to define a key to 'group' the duplicates. It seems that they are grouped according to the node name, the @id and @method attributes, and the parent node and @id. Therefore you would define the key like so:
<xsl:key
name="duplicates"
match="*"
use="concat(local-name(), '|', @id, '|', @method, '|', local-name(..), '|', ../@id, '|', local-name(../..), '|', ../../@id)"/>
Then, you need to ignore elements that are not first in the key. I think you also need a clause to only match elements that are the 'child' elements (otherwise who section elements would be ignored)
<xsl:template
match="*
[@id!='']
[not(.//*[@id!=''])]
[generate-id() != generate-id(key('duplicates', concat(local-name(), '|', @id, '|', @method, '|', local-name(..), '|', ../@id, '|', local-name(../..), '|', ../../@id))[1])]/>
To add to the complexity, it looks like you don't want to output elements where all the child elements are duplicates.
<xsl:template
match="*
[@id!='']
[.//*[@id!='']]
[not(.//*
[not(.//*[@id!=''])]
[generate-id() = generate-id(key('duplicates', concat(local-name(), '|', @id, '|', @method, '|', local-name(..), '|', ../@id, '|', local-name(../..), '|', ../../@id))[1])])
]" />
Try the following XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:key name="duplicates" match="*" use="concat(local-name(), '|', @id, '|', @method, '|', local-name(..), '|', ../@id, '|', local-name(../..), '|', ../../@id)"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[@id!=''][not(.//*[@id!=''])][generate-id() != generate-id(key('duplicates', concat(local-name(), '|', @id, '|', @method, '|', local-name(..), '|', ../@id, '|', local-name(../..), '|', ../../@id))[1])]"/>
<xsl:template match="*[@id!=''][.//*[@id!='']][not(.//*[not(.//*[@id!=''])][generate-id() = generate-id(key('duplicates', concat(local-name(), '|', @id, '|', @method, '|', local-name(..), '|', ../@id, '|', local-name(../..), '|', ../../@id))[1])])]"/>
</xsl:stylesheet>
When applied to your sample XML, the following is output:
<myroot>
<nodeA id="a">
<section id="i">
<item id="0" method="a"><!-- parent section id="i" , keep this node-->
<somechild>a</somechild>
</item>
<item id="1" method="a">
<otherchild>a</otherchild>
</item>
</section>
<cell id="i">
<part id="1" method="b"><!-- parent cell id="i", keep this node-->
<attr>u</attr>
</part>
</cell>
<section id="i">
<item id="3" method="a">
<other>xx</other>
</item>
<item id="0" method="b"><!-- this has same id but different method, so we keep this -->
<otherchild>a</otherchild>
</item>
</section>
</nodeA>
<nodeA id="b">
<section id="i">
<item id="1" method="a">
<otherchild>a</otherchild>
</item>
</section>
<section id="i">
<item id="0" method="a">
<type>blah</type>
</item>
</section>
</nodeA>
<nodeB id="a">
<cell id="i">
<part id="1" method="b">
<attr>u</attr>
</part>
</cell>
<section id="i">
<item id="0" method="a">
<type>blah</type>
</item>
</section>
</nodeB>
</myroot>
Upvotes: 1