Reputation: 6527
I have some XML where I would like to remove identical consecutive child nodes, which are in different parents. That is, if a child (in different parents) node my XML tree appears two times or more consecutively, I want to remove all the duplicates.
The duplicate nodes I'm thinking of are the <child>a</child>
in the first two <parent>
nodes.
An example:
Here is the source XML:
<root>
<parent>
<child>a</child>
<child>b</child>
<child>c</child>
</parent>
<parent>
<child>a</child>
<child>bb</child>
<child>cc</child>
</parent>
<parent>
<child>aaa</child>
<child>bbb</child>
<child>ccc</child>
</parent>
<parent>
<child>a</child>
<child>bbbb</child>
<child>cccc</child>
</parent>
</root>
Here is the desired XML:
<root>
<parent>
<child>a</child>
<child>b</child>
<child>c</child>
</parent>
<parent>
<child>bb</child>
<child>cc</child>
</parent>
<parent>
<child>aaa</child>
<child>bbb</child>
<child>ccc</child>
</parent>
<parent>
<child>a</child>
<child>bbbb</child>
<child>cccc</child>
</parent>
</root>
Only one element is removed but if there were, for example, 5 consecutive <child>a</child>
nodes at the beginning (instead of 2), four of them would be removed. I'm using XSLT 2.0.
I appreciate any help.
Follow-Up:
Thanks to Kirill I get the documents I want, however this has spawned a new problem that I didn't anticipate, if I have an XML document like this:
<root>
<parent>
<child>a</child>
<child>b</child>
<child>c</child>
</parent>
<parent>
<child>a</child>
<child>b</child>
<child>c</child>
</parent>
<parent>
<child>aaa</child>
<child>bbb</child>
<child>ccc</child>
</parent>
</root>
And I apply Kirill's XSLT, I get this:
<root>
<parent>
<child>a</child>
<child>b</child>
<child>c</child>
</parent>
<parent>
</parent>
<parent>
<child>aaa</child>
<child>bbb</child>
<child>ccc</child>
</parent>
</root>
How can I also remove the <parent> </parent>
? For my application there may be other subelements of <parent>
, which are OK to remove if there is no <child>
element in the <parent>
element.
A solution I have, that I don't like, is to apply another transform after the first one. This only works when applied in order though and I need a separate XSLT file and need to run two commands instead of one.
Here it is:
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="node() | @*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="parent[not(child)]"/>
Upvotes: 2
Views: 1003
Reputation: 243449
This answers the newly added followup question:
How can I also remove the
<parent> </parent>
? For my application there may be other subelements of<parent>
, which are OK to remove if there is no<child>
element in the element.
This transformation is an add-on to Kirill's and accomplishes the desired cleanup of the would-be resulting empty parent
elementwithout the need of a second pass:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="child[../preceding-sibling::parent[1]/child = .]"/>
<xsl:template match=
"parent
[not(child
[not(. = ../preceding-sibling::parent[1]
/child
)
]
)
]"/>
</xsl:stylesheet>
when applied to the provided XML document:
<root>
<parent>
<child>a</child>
<child>b</child>
<child>c</child>
</parent>
<parent>
<child>a</child>
<child>b</child>
<child>c</child>
</parent>
<parent>
<child>aaa</child>
<child>bbb</child>
<child>ccc</child>
</parent>
</root>
the wanted, correct result is produced:
<root>
<parent>
<child>a</child>
<child>b</child>
<child>c</child>
</parent>
<parent>
<child>aaa</child>
<child>bbb</child>
<child>ccc</child>
</parent>
</root>
Upvotes: 0
Reputation: 163322
If you're able to use XSLT 2.0, the problem is solved as follows:
<xsl:for-each-group select="parent" group-adjacent="child[1]">
<xsl:for-each select="current-group()">
<parent>
<xsl:if test="position()=1">
<xsl:copy-of select="current-group()[1]/child[1]"/>
</xsl:if>
<xsl:copy-of select="current-group()/child[position() gt 1]"/>
</parent>
</xsl:for-each>
</xsl:for-each-group>
Upvotes: 2
Reputation: 56162
Use:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="child[../preceding-sibling::parent[1]/child = .]"/>
</xsl:stylesheet>
Upvotes: 3