XSLT-explorer
XSLT-explorer

Reputation: 97

Split XML file according attribute's value

Need to split an XML file basing on attribute's value. Is it possible to do with XSLT-1.0? If there are no possibilities to done it with 1.0 version, i would appreciate any XSLT code of higher version.

Here is a numeric split-attribute's value (10, 11, 12 etc). But i suppose solution's principle could be universal for numeric and non-numeric sequences. The new file is produced when system finds the first new (changed) value of split-attribute.

(optional question). How huge XML files could be for those operation? Is it possible to deal with a 3gb file? 30gb file? Are there any system requirements to RAM to deal with such file sizes?

SOURCE:

<objects>
  <obj attribute-1="value" attribute-2="value2" split-attribute="10"/>
  <obj attribute-1="value" attribute-2="value2" split-attribute="10"/>
  <obj attribute-1="value" attribute-2="value2" split-attribute="10"/>

  <obj attribute-1="value" attribute-2="value2" split-attribute="11"/>
  <obj attribute-1="value" attribute-2="value2" split-attribute="11"/>
  <obj attribute-1="value" attribute-2="value2" split-attribute="11"/>

  <obj attribute-1="value" attribute-2="value2" split-attribute="12"/>
  <obj attribute-1="value" attribute-2="value2" split-attribute="12"/>
  <obj attribute-1="value" attribute-2="value2" split-attribute="12"/>
</objects>

DESIRED OUTPUT

<objects>
  <obj attribute-1="value" attribute-2="value2" split-attribute="10"/>
  <obj attribute-1="value" attribute-2="value2" split-attribute="10"/>
  <obj attribute-1="value" attribute-2="value2" split-attribute="10"/>
</objects>
<!--=========================== file-1.xml ======================-->


<objects>
  <obj attribute-1="value" attribute-2="value2" split-attribute="11"/>
  <obj attribute-1="value" attribute-2="value2" split-attribute="11"/>
  <obj attribute-1="value" attribute-2="value2" split-attribute="11"/>
</objects>
<!--=========================== file-2.xml ======================-->

<objects>
  <obj attribute-1="value" attribute-2="value2" split-attribute="12"/>
  <obj attribute-1="value" attribute-2="value2" split-attribute="12"/>
  <obj attribute-1="value" attribute-2="value2" split-attribute="12"/>
</objects>
<!--=========================== file-3.xml ======================-->

Upvotes: 0

Views: 130

Answers (2)

Martin Honnen
Martin Honnen

Reputation: 167696

You have a nice XSLT 2.0 answer although even there I think using group-adjacent is more adequate for your need ("new file is produced when system finds the first new (changed) value of split-attribute"); to make it work with XSLT 3 and streaming (and a processor supporting that, like Saxon 9 EE) you could use

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode stremable="yes"/>

  <xsl:template match="/objects">
    <xsl:for-each-group select="obj" group-adjacent="@split-attribute">
      <xsl:result-document href="file-{position()}.xml" indent="yes">
        <objects>
          <xsl:copy-of select="current-group()" />
        </objects>
      </xsl:result-document>
    </xsl:for-each-group>
  </xsl:template>

</xsl:stylesheet>

That way it should even work with very huge files.

Upvotes: 0

zx485
zx485

Reputation: 29042

This can be done with XSLT-2.0 and above. The required xsl:result-document function is introduced with version 2.0.

Now the solution is straightforward:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/objects">
    <xsl:for-each-group select="obj" group-by="@split-attribute">
      <xsl:result-document href="{concat('File-',position(),'.xml')}" indent="yes">
        <objects>
          <xsl:copy-of select="current-group()" />
        </objects>
      </xsl:result-document>
    </xsl:for-each-group>
  </xsl:template>

</xsl:stylesheet>

The output is as desired and consists of three separate files.

Upvotes: 2

Related Questions