user3629892
user3629892

Reputation: 3046

xslt 2.0: create hierarchy that contains each node only once

I don't know if that's the best title but here's what I'd like to do:

My input file:

 <?xml version="1.0" encoding="UTF-8"?>
    <out>
    <cat id="d1e3">
        <ip level="1" id="d1e3a1814" content="ABC">
            <ip level="2" id="d1e3a1815" content="DEF"/>
        </ip>
        <pq level="1" id="d1e3a1911" content="XPQ"/>
    </cat>
    <cat id="d1e8">
        <ip level="1" id="d1e8a1814" content="ABC">
            <ip level="2" id="d1e8a1815" content="TXTXT"/>
        </ip>
        <pq level="1" id="d1e8a1911" content="XPQ"/>
    </cat>
    <cat id="d1e13">
        <ip level="1" id="d1e13a1814" content="ABC">
            <ip level="2" id="d1e13a1815" content="TXTXT"/>
        </ip>
        <pq level="1" id="d1e13a1911" content="XPQ">
            <pq level="2" id="d1e13a1912" content="1234"/>
        </pq>
    </cat>
    <cat id="d1e569">
        <ip level="1" id="d1e569a1814" content="ABC">
            <ip level="2" id="d1e569a1815" content="TXTXT"/>
        </ip>
        <pq level="1" id="d1e569a1911" content="XPQ">
            <pq level="2" id="d1e569a1912" content="1234">
                <pq level="3" id="d1e569a1913" content="345">
                    <pq level="4" id="d1e569a1914" content="456">
                        <pq level="5" id="d1e569a1915" content="567"/>
                    </pq>
                </pq>
            </pq>
        </pq>
    </cat>
    <cat id="d1e666">
        <ip level="1" id="d1e666a1814" content="ABC">
            <ip level="2" id="d1e666a1815" content="TXTXT"/>
        </ip>
        <pq level="1" id="d1e666a1911" content="XPQ">
            <pq level="2" id="d1e666a1912" content="1234">
                <pq level="3" id="d1e666a1913" content="8787"/>
            </pq>
        </pq>
    </cat>
    </out>

My desired output:

 <?xml version="1.0" encoding="UTF-8"?>
    <out>
        <new level="1" id="d1e3a1814" content="ABC">
            <new level="2" id="d1e3a1815" content="DEF"/>
            <new level="2" id="d1e8a1815" content="TXTXT"/>
        </new>
         <new level="1" id="d1e13a1911" content="XPQ">
            <new level="2" id="d1e569a1912" content="1234">
                <new level="3" id="d1e569a1913" content="345">
                    <new level="4" id="d1e569a1914" content="456">
                        <new level="5" id="d1e569a1915" content="567"/>
                    </new>
                </new>
            </new>
            <new level="3" id="d1e666a1913" content="8787"/>
        </new>
    </out>

using xslt 2.0 and saxon9he.

So what happened here is that each ip node occurs only once in the output document and each of its children also only occurs once and the same applies for the children of those chuildren and so on. The same applies for the pq nodes.

Each node should only occur once while maintaining the hierarchy. Basically, I need a document from which I can tell, which node is the parent node of which other node, but I need this information only once for each node (so in essence, each id should only appear once in the output document).

In the input document, ip and pq can have any number of child nodes and they can very in each node element.

I have tried using for-each-group on all ip elements grouping by @id but I can't figure out how to procede from there. Do I need to do this recursively?

I'd be thankful for tips and help!

Upvotes: 1

Views: 253

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167571

When running

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:key name="ip-id" match="ip" use="@id"/>
    <xsl:key name="pq-id" match="pq" use="@id"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="cat">
        <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="ip[. is key('ip-id', @id)[1]]">
        <new-ip>
            <xsl:apply-templates select="@* , key('ip-id', @id)/ip"/>
        </new-ip>
    </xsl:template>

    <xsl:template match="ip[not(. is key('ip-id', @id)[1])]"/>

    <xsl:template match="pq[. is key('pq-id', @id)[1]]">
        <new-pq>
            <xsl:apply-templates select="@* , key('pq-id', @id)/pq"/>
        </new-pq>
    </xsl:template>

    <xsl:template match="pq[not(. is key('pq-id', @id)[1])]"/>

</xsl:transform>

against your input, I get the result

<out>
   <new-ip level="1" id="d1e3a1814" content="ABC">
      <new-ip level="2" id="d1e3a1815" content="DEF"/>
   </new-ip>
   <new-pq level="1" id="d1e3a1911" content="XPQ"/>
   <new-ip level="1" id="d1e8a1814" content="ABC">
      <new-ip level="2" id="d1e8a1815" content="TXTXT"/>
   </new-ip>
   <new-pq level="1" id="d1e8a1911" content="XPQ"/>
   <new-ip level="1" id="d1e13a1814" content="ABC">
      <new-ip level="2" id="d1e13a1815" content="TXTXT"/>
   </new-ip>
   <new-pq level="1" id="d1e13a1911" content="XPQ">
      <new-pq level="2" id="d1e13a1912" content="1234"/>
   </new-pq>
   <new-ip level="1" id="d1e569a1814" content="ABC">
      <new-ip level="2" id="d1e569a1815" content="TXTXT"/>
   </new-ip>
   <new-pq level="1" id="d1e569a1911" content="XPQ">
      <new-pq level="2" id="d1e569a1912" content="1234">
         <new-pq level="3" id="d1e569a1913" content="345">
            <new-pq level="4" id="d1e569a1914" content="456">
               <new-pq level="5" id="d1e569a1915" content="567"/>
            </new-pq>
         </new-pq>
      </new-pq>
   </new-pq>
   <new-ip level="1" id="d1e666a1814" content="ABC">
      <new-ip level="2" id="d1e666a1815" content="TXTXT"/>
   </new-ip>
   <new-pq level="1" id="d1e666a1911" content="XPQ">
      <new-pq level="2" id="d1e666a1912" content="1234">
         <new-pq level="3" id="d1e666a1913" content="8787"/>
      </new-pq>
   </new-pq>
</out>

I realize there are far more nodes then in your wanted result but I think their id attributes are unique.

The output you have posted as wanted is created when using the content attribute and not the id attribute to identify ip and pq elements:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:key name="content" match="ip | pq" use="@content"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="cat">
        <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="ip[. is key('content', @content)[1]] | pq[. is key('content', @content)[1]]">
        <new>
            <xsl:apply-templates select="@* , key('content', @content)/*"/>
        </new>
    </xsl:template>

    <xsl:template match="ip[not(. is key('content', @content)[1])] | pq[not(. is key('content', @content)[1])]"/>

</xsl:transform>

Upvotes: 1

Related Questions