Jaya Krishna
Jaya Krishna

Reputation: 15

Streaming XSLT 3 with Nested Group by logic

I am working on transforming a huge set of data (~1000k records) using XSLT 3.0. I am getting Java memory heap errors in my ERP system (Workday) since the input XML message is very big. I tried streaming xslt only, but couldn't make it work. Can someone please assist me in transforming the data memory efficient way.

<?xml version="1.0" encoding="UTF-8"?>
<a:Report_Data xmlns:a="urn:com.workday.report/INT_Currency_Conversion_Rates_-_Monthly_Source">
    
    <a:Report_Entry>
        <a:Source_Currency>USD</a:Source_Currency>
        <a:Target_Currency>INR</a:Target_Currency>
        <a:Currency_Rate>76.33</a:Currency_Rate>
    </a:Report_Entry>
    <a:Report_Entry>
        <a:Source_Currency>USD</a:Source_Currency>
        <a:Target_Currency>CHN</a:Target_Currency>
        <a:Currency_Rate>16.33</a:Currency_Rate>
    </a:Report_Entry>
    <a:Report_Entry>
        <a:Source_Currency>CHN</a:Source_Currency>
        <a:Target_Currency>INR</a:Target_Currency>
        <a:Currency_Rate>26.33</a:Currency_Rate>
    </a:Report_Entry>
    
</a:Report_Data>

XSLT code that I have tried:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:a="urn:com.workday.report/INT_Currency_Conversion_Rates_-_Monthly_Source" 
    xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"
    version="3.0">
    
    <xsl:output method="xml" indent="no" omit-xml-declaration="yes" />
    
    <xsl:mode streamable="yes" on-no-match="shallow-skip" />
    
    <xsl:template match="a:Report_Data">
        <RTMap>
            
            <xsl:fork>
                <xsl:for-each-group select="a:Report_Entry/copy-of()" group-by="a:Source_Currency">
                    <xsl:for-each-group select="current-group()" group-by="a:Target_Currency">
                        <Row>
                            <Map_Rate><xsl:value-of select="avg(current-group()/a:Currency_Rate)"/></Map_Rate>
                            <Map_From_Currency><xsl:value-of select="a:Source_Currency"/></Map_From_Currency>
                            <Map_Target_Currency><xsl:value-of select="a:Target_Currency"/></Map_Target_Currency>
                        </Row>
                    </xsl:for-each-group>
                </xsl:for-each-group>
            </xsl:fork>
            
        </RTMap>
    </xsl:template> 
    
</xsl:stylesheet>

Thank you, Jay

Upvotes: 0

Views: 319

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167696

In XSLT 3 you can use a composite key and if you use copy-of() you don't need the xsl:fork:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xpath-default-namespace="urn:com.workday.report/INT_Currency_Conversion_Rates_-_Monthly_Source"
  exclude-result-prefixes="#all"
  expand-text="yes">

  <xsl:template match="Report_Data">
   <RTMap>
     <xsl:for-each-group select="Report_Entry!copy-of()" composite="yes" group-by="Source_Currency, Target_Currency ">
       <Row>
         <Map_Rate>{avg(current-group()/Currency_Rate)}</Map_Rate>
         <Map_From_Currency>{current-grouping-key()[1]}</Map_From_Currency>
         <Map_Target_Currency>{current-grouping-key()[2]}</Map_Target_Currency> 
       </Row>
     </xsl:for-each-group>
   </RTMap>
  </xsl:template>

  <xsl:output method="xml" indent="yes"/>

  <xsl:mode on-no-match="shallow-skip" streamable="yes"/>

</xsl:stylesheet>

But in the end any group-by needs to buffer groups as you don't know whether the last Report_Entry might belong into the first group so any grouping of that input based on those keys will consume memory. Streamed grouping with a low memory consumption works if you use group-starting-with or group-adjacent, if the input data and the requirements allow that, but group-by is always going to buffer groups.

Upvotes: 1

Related Questions