Reputation: 97
Need to split an XML file basing on attribute's value. Is it possible to do with XSLT-1.0? If there are no possibilities to done it with 1.0 version, i would appreciate any XSLT code of higher version.
Here is a numeric split-attribute's value (10, 11, 12 etc). But i suppose solution's principle could be universal for numeric and non-numeric sequences. The new file is produced when system finds the first new (changed) value of split-attribute.
(optional question). How huge XML files could be for those operation? Is it possible to deal with a 3gb file? 30gb file? Are there any system requirements to RAM to deal with such file sizes?
SOURCE:
<objects>
<obj attribute-1="value" attribute-2="value2" split-attribute="10"/>
<obj attribute-1="value" attribute-2="value2" split-attribute="10"/>
<obj attribute-1="value" attribute-2="value2" split-attribute="10"/>
<obj attribute-1="value" attribute-2="value2" split-attribute="11"/>
<obj attribute-1="value" attribute-2="value2" split-attribute="11"/>
<obj attribute-1="value" attribute-2="value2" split-attribute="11"/>
<obj attribute-1="value" attribute-2="value2" split-attribute="12"/>
<obj attribute-1="value" attribute-2="value2" split-attribute="12"/>
<obj attribute-1="value" attribute-2="value2" split-attribute="12"/>
</objects>
DESIRED OUTPUT
<objects>
<obj attribute-1="value" attribute-2="value2" split-attribute="10"/>
<obj attribute-1="value" attribute-2="value2" split-attribute="10"/>
<obj attribute-1="value" attribute-2="value2" split-attribute="10"/>
</objects>
<!--=========================== file-1.xml ======================-->
<objects>
<obj attribute-1="value" attribute-2="value2" split-attribute="11"/>
<obj attribute-1="value" attribute-2="value2" split-attribute="11"/>
<obj attribute-1="value" attribute-2="value2" split-attribute="11"/>
</objects>
<!--=========================== file-2.xml ======================-->
<objects>
<obj attribute-1="value" attribute-2="value2" split-attribute="12"/>
<obj attribute-1="value" attribute-2="value2" split-attribute="12"/>
<obj attribute-1="value" attribute-2="value2" split-attribute="12"/>
</objects>
<!--=========================== file-3.xml ======================-->
Upvotes: 0
Views: 130
Reputation: 167696
You have a nice XSLT 2.0 answer although even there I think using group-adjacent
is more adequate for your need ("new file is produced when system finds the first new (changed) value of split-attribute"); to make it work with XSLT 3 and streaming (and a processor supporting that, like Saxon 9 EE) you could use
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:mode stremable="yes"/>
<xsl:template match="/objects">
<xsl:for-each-group select="obj" group-adjacent="@split-attribute">
<xsl:result-document href="file-{position()}.xml" indent="yes">
<objects>
<xsl:copy-of select="current-group()" />
</objects>
</xsl:result-document>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
That way it should even work with very huge files.
Upvotes: 0
Reputation: 29042
This can be done with XSLT-2.0 and above. The required xsl:result-document function is introduced with version 2.0.
Now the solution is straightforward:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/objects">
<xsl:for-each-group select="obj" group-by="@split-attribute">
<xsl:result-document href="{concat('File-',position(),'.xml')}" indent="yes">
<objects>
<xsl:copy-of select="current-group()" />
</objects>
</xsl:result-document>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
The output is as desired and consists of three separate files.
Upvotes: 2