Reputation: 15
I am working on transforming a huge set of data (~1000k records) using XSLT 3.0. I am getting Java memory heap errors in my ERP system (Workday) since the input XML message is very big. I tried streaming xslt only, but couldn't make it work. Can someone please assist me in transforming the data memory efficient way.
<?xml version="1.0" encoding="UTF-8"?>
<a:Report_Data xmlns:a="urn:com.workday.report/INT_Currency_Conversion_Rates_-_Monthly_Source">
<a:Report_Entry>
<a:Source_Currency>USD</a:Source_Currency>
<a:Target_Currency>INR</a:Target_Currency>
<a:Currency_Rate>76.33</a:Currency_Rate>
</a:Report_Entry>
<a:Report_Entry>
<a:Source_Currency>USD</a:Source_Currency>
<a:Target_Currency>CHN</a:Target_Currency>
<a:Currency_Rate>16.33</a:Currency_Rate>
</a:Report_Entry>
<a:Report_Entry>
<a:Source_Currency>CHN</a:Source_Currency>
<a:Target_Currency>INR</a:Target_Currency>
<a:Currency_Rate>26.33</a:Currency_Rate>
</a:Report_Entry>
</a:Report_Data>
XSLT code that I have tried:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:a="urn:com.workday.report/INT_Currency_Conversion_Rates_-_Monthly_Source"
xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"
version="3.0">
<xsl:output method="xml" indent="no" omit-xml-declaration="yes" />
<xsl:mode streamable="yes" on-no-match="shallow-skip" />
<xsl:template match="a:Report_Data">
<RTMap>
<xsl:fork>
<xsl:for-each-group select="a:Report_Entry/copy-of()" group-by="a:Source_Currency">
<xsl:for-each-group select="current-group()" group-by="a:Target_Currency">
<Row>
<Map_Rate><xsl:value-of select="avg(current-group()/a:Currency_Rate)"/></Map_Rate>
<Map_From_Currency><xsl:value-of select="a:Source_Currency"/></Map_From_Currency>
<Map_Target_Currency><xsl:value-of select="a:Target_Currency"/></Map_Target_Currency>
</Row>
</xsl:for-each-group>
</xsl:for-each-group>
</xsl:fork>
</RTMap>
</xsl:template>
</xsl:stylesheet>
Thank you, Jay
Upvotes: 0
Views: 319
Reputation: 167696
In XSLT 3 you can use a composite key and if you use copy-of()
you don't need the xsl:fork
:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xpath-default-namespace="urn:com.workday.report/INT_Currency_Conversion_Rates_-_Monthly_Source"
exclude-result-prefixes="#all"
expand-text="yes">
<xsl:template match="Report_Data">
<RTMap>
<xsl:for-each-group select="Report_Entry!copy-of()" composite="yes" group-by="Source_Currency, Target_Currency ">
<Row>
<Map_Rate>{avg(current-group()/Currency_Rate)}</Map_Rate>
<Map_From_Currency>{current-grouping-key()[1]}</Map_From_Currency>
<Map_Target_Currency>{current-grouping-key()[2]}</Map_Target_Currency>
</Row>
</xsl:for-each-group>
</RTMap>
</xsl:template>
<xsl:output method="xml" indent="yes"/>
<xsl:mode on-no-match="shallow-skip" streamable="yes"/>
</xsl:stylesheet>
But in the end any group-by needs to buffer groups as you don't know whether the last Report_Entry
might belong into the first group so any grouping of that input based on those keys will consume memory. Streamed grouping with a low memory consumption works if you use group-starting-with
or group-adjacent
, if the input data and the requirements allow that, but group-by
is always going to buffer groups.
Upvotes: 1