Reputation: 3046
I don't know if that's the best title but here's what I'd like to do:
My input file:
<?xml version="1.0" encoding="UTF-8"?>
<out>
<cat id="d1e3">
<ip level="1" id="d1e3a1814" content="ABC">
<ip level="2" id="d1e3a1815" content="DEF"/>
</ip>
<pq level="1" id="d1e3a1911" content="XPQ"/>
</cat>
<cat id="d1e8">
<ip level="1" id="d1e8a1814" content="ABC">
<ip level="2" id="d1e8a1815" content="TXTXT"/>
</ip>
<pq level="1" id="d1e8a1911" content="XPQ"/>
</cat>
<cat id="d1e13">
<ip level="1" id="d1e13a1814" content="ABC">
<ip level="2" id="d1e13a1815" content="TXTXT"/>
</ip>
<pq level="1" id="d1e13a1911" content="XPQ">
<pq level="2" id="d1e13a1912" content="1234"/>
</pq>
</cat>
<cat id="d1e569">
<ip level="1" id="d1e569a1814" content="ABC">
<ip level="2" id="d1e569a1815" content="TXTXT"/>
</ip>
<pq level="1" id="d1e569a1911" content="XPQ">
<pq level="2" id="d1e569a1912" content="1234">
<pq level="3" id="d1e569a1913" content="345">
<pq level="4" id="d1e569a1914" content="456">
<pq level="5" id="d1e569a1915" content="567"/>
</pq>
</pq>
</pq>
</pq>
</cat>
<cat id="d1e666">
<ip level="1" id="d1e666a1814" content="ABC">
<ip level="2" id="d1e666a1815" content="TXTXT"/>
</ip>
<pq level="1" id="d1e666a1911" content="XPQ">
<pq level="2" id="d1e666a1912" content="1234">
<pq level="3" id="d1e666a1913" content="8787"/>
</pq>
</pq>
</cat>
</out>
My desired output:
<?xml version="1.0" encoding="UTF-8"?>
<out>
<new level="1" id="d1e3a1814" content="ABC">
<new level="2" id="d1e3a1815" content="DEF"/>
<new level="2" id="d1e8a1815" content="TXTXT"/>
</new>
<new level="1" id="d1e13a1911" content="XPQ">
<new level="2" id="d1e569a1912" content="1234">
<new level="3" id="d1e569a1913" content="345">
<new level="4" id="d1e569a1914" content="456">
<new level="5" id="d1e569a1915" content="567"/>
</new>
</new>
</new>
<new level="3" id="d1e666a1913" content="8787"/>
</new>
</out>
using xslt 2.0 and saxon9he.
So what happened here is that each ip
node occurs only once in the output document and each of its children also only occurs once and the same applies for the children of those chuildren and so on. The same applies for the pq
nodes.
Each node should only occur once while maintaining the hierarchy. Basically, I need a document from which I can tell, which node is the parent node of which other node, but I need this information only once for each node (so in essence, each id should only appear once in the output document).
In the input document, ip
and pq
can have any number of child nodes and they can very in each node
element.
I have tried using for-each-group on all ip
elements grouping by @id
but I can't figure out how to procede from there. Do I need to do this recursively?
I'd be thankful for tips and help!
Upvotes: 1
Views: 253
Reputation: 167571
When running
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="ip-id" match="ip" use="@id"/>
<xsl:key name="pq-id" match="pq" use="@id"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cat">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="ip[. is key('ip-id', @id)[1]]">
<new-ip>
<xsl:apply-templates select="@* , key('ip-id', @id)/ip"/>
</new-ip>
</xsl:template>
<xsl:template match="ip[not(. is key('ip-id', @id)[1])]"/>
<xsl:template match="pq[. is key('pq-id', @id)[1]]">
<new-pq>
<xsl:apply-templates select="@* , key('pq-id', @id)/pq"/>
</new-pq>
</xsl:template>
<xsl:template match="pq[not(. is key('pq-id', @id)[1])]"/>
</xsl:transform>
against your input, I get the result
<out>
<new-ip level="1" id="d1e3a1814" content="ABC">
<new-ip level="2" id="d1e3a1815" content="DEF"/>
</new-ip>
<new-pq level="1" id="d1e3a1911" content="XPQ"/>
<new-ip level="1" id="d1e8a1814" content="ABC">
<new-ip level="2" id="d1e8a1815" content="TXTXT"/>
</new-ip>
<new-pq level="1" id="d1e8a1911" content="XPQ"/>
<new-ip level="1" id="d1e13a1814" content="ABC">
<new-ip level="2" id="d1e13a1815" content="TXTXT"/>
</new-ip>
<new-pq level="1" id="d1e13a1911" content="XPQ">
<new-pq level="2" id="d1e13a1912" content="1234"/>
</new-pq>
<new-ip level="1" id="d1e569a1814" content="ABC">
<new-ip level="2" id="d1e569a1815" content="TXTXT"/>
</new-ip>
<new-pq level="1" id="d1e569a1911" content="XPQ">
<new-pq level="2" id="d1e569a1912" content="1234">
<new-pq level="3" id="d1e569a1913" content="345">
<new-pq level="4" id="d1e569a1914" content="456">
<new-pq level="5" id="d1e569a1915" content="567"/>
</new-pq>
</new-pq>
</new-pq>
</new-pq>
<new-ip level="1" id="d1e666a1814" content="ABC">
<new-ip level="2" id="d1e666a1815" content="TXTXT"/>
</new-ip>
<new-pq level="1" id="d1e666a1911" content="XPQ">
<new-pq level="2" id="d1e666a1912" content="1234">
<new-pq level="3" id="d1e666a1913" content="8787"/>
</new-pq>
</new-pq>
</out>
I realize there are far more nodes then in your wanted result but I think their id
attributes are unique.
The output you have posted as wanted is created when using the content
attribute and not the id
attribute to identify ip
and pq
elements:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="content" match="ip | pq" use="@content"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cat">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="ip[. is key('content', @content)[1]] | pq[. is key('content', @content)[1]]">
<new>
<xsl:apply-templates select="@* , key('content', @content)/*"/>
</new>
</xsl:template>
<xsl:template match="ip[not(. is key('content', @content)[1])] | pq[not(. is key('content', @content)[1])]"/>
</xsl:transform>
Upvotes: 1