Aaron
Aaron

Reputation: 271

Using XSLT to split duplicate text() and group non-duplicates together

I have the following input XML:

<root>
    <element>
          <id>1</id>
          <text><![CDATA[My text 1]]></text>
    </element>
    <element>
          <id>2</id>
          <text><![CDATA[My text 1]]></text>
    </element>
    <element>
          <id>3</id>
          <text><![CDATA[My text 2]]></text>
    </element>
    <element>
          <id>4</id>
          <text><![CDATA[My text 2]]></text>
    </element>
    <element>
          <id>5</id>
          <text><![CDATA[My text 3]]></text>
    </element>
</root>

I'm looking to transform this using XSLT 2.0, to split duplicate text() in the text element and group my non-duplicates together into separate files (for any number of duplicates - my example only shows two). So I should have no duplicate text() in any of my output files, and they need to be grouped into as few files as possible. My output from the above should look like this:

document1.xml

<root>
    <element>
          <id>1</id>
          <text><![CDATA[My text 1]]></text>
    </element>
    <element>
          <id>3</id>
          <text><![CDATA[My text 2]]></text>
    </element>
    <element>
          <id>5</id>
          <text><![CDATA[My text 3]]></text>
    </element>
</root>

document2.xml

<root>
    <element>
          <id>2</id>
          <text><![CDATA[My text 1]]></text>
    </element>
    <element>
          <id>4</id>
          <text><![CDATA[My text 2]]></text>
    </element>
</root>

My existing XSLT snippet looks like this: I get the feeling I need to collect my duplicates together in my for-each-group (in order to split by position), but obviously this results in one file per element:

<xsl:for-each-group select="descendant::element" group-by="text[text() = preceding::text/text() or text() = following::text/text()]">
            <xsl:result-document href="{concat($outputdir,'\document',position(),'.xml')}" method="xml" indent="yes" cdata-section-elements="text">
        <root>
            <xsl:copy-of select="."/>   
        </root>
    </xsl:result-document>
</xsl:for-each-group>

Appreciate any help you're able to offer. Thanks in advance.

Upvotes: 4

Views: 288

Answers (2)

Rudolf Yurgenson
Rudolf Yurgenson

Reputation: 603

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="/root">
        <xsl:call-template name="out">
            <xsl:with-param name="level" select="1"/>
            <xsl:with-param name="root" select="."/>
        </xsl:call-template>
    </xsl:template>

    <xsl:template name="out">
        <xsl:param name="root"/>
        <xsl:param name="level"/>

        <xsl:if test="$root/*">
            <xsl:result-document href="document{$level}.xml">
                <root>
                    <xsl:for-each-group select="$root/*" group-by="text">
                        <xsl:copy-of select="current()"/>
                    </xsl:for-each-group>
                </root>
            </xsl:result-document>

            <xsl:call-template name="out">
                <xsl:with-param name="level" select="$level+1"/>
                <xsl:with-param name="root">
                    <xsl:for-each-group select="$root/*" group-by="text">
                        <xsl:copy-of select="current-group()[. != current()]"/>
                    </xsl:for-each-group>
                </xsl:with-param>
            </xsl:call-template>
        </xsl:if>
    </xsl:template>

</xsl:stylesheet>

Upvotes: 0

Martin Honnen
Martin Honnen

Reputation: 167436

The following looks rather like an imperative solution done with XSLT but I think it does the job:

<xsl:stylesheet
  version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs">

<xsl:output indent="yes"/>

<xsl:template match="root">
  <xsl:variable name="groups">
    <xsl:for-each-group select="element" group-by="text">
      <group key="{current-grouping-key()}">
        <xsl:copy-of select="current-group()"/>
      </group>
    </xsl:for-each-group>
  </xsl:variable>
  <xsl:variable name="max-size" select="max($groups/group/count(element))"/>
  <xsl:for-each select="1 to $max-size">
    <xsl:result-document href="document{.}.xml">
      <root>
        <xsl:copy-of select="$groups/group/element[position() eq current()]"/>
      </root>
    </xsl:result-document>
  </xsl:for-each>
</xsl:template>

</xsl:stylesheet>

Upvotes: 2

Related Questions