Reto
Reto

Reputation: 3141

XSLT 3.0 burst-streaming - how to store/get value of another branch

We are using Saxon EE 12.x to transform large (multi-GB) XML files into a much smaller internal XML-structure. We would like to reduce memory-consumption and burst-streaming on each <Document>-node seems to be a prefect match for our application. It works - but we struggle with some seemingly simple construct: How to get the information like the <PrintDate> of the <ExportHeader>-Branch and add it to each output <Doc>-element as an attribute in a streaming-mode?

The source XML contains 1000s of <Document>-nodes, each <Document>-node contains 1000s of lines of XML, the <ExportHeader>-node is very small, just contains some Meta-info. A reduced example source-XML looks like this:

<Export>
  <ExportHeader>
    <PrintDate>2024-12-24</PrintDate>
  </ExportHeader>
  <ExportContent>
    <Document>
      <LotsOfXml></LotsOfXml>
    </Document>
    <Document>
      <LotsOfXml></LotsOfXml>
    </Document>
    <Document>
      <LotsOfXml></LotsOfXml>
    </Document>
  </ExportContent>
</Export>

our target XML should look like this:

<Exp>
  <Doc PrintDate="???">
    <LotsOfTransformedXML></LotsOfTransformedXML>
  </Doc>
  <Doc PrintDate="???">
    <LotsOfTransformedXML></LotsOfTransformedXML>
  </Doc>
  <Doc PrintDate="???">
    <LotsOfTransformedXML></LotsOfTransformedXML>
  </Doc>
</Exp>

A very simple XSLT would look like this, but we don't know a way to get the <PrintDate> into the <Doc>-node attribute without breaking streamability... Can an <xsl:accumulator> be used to store/get the value, if yes, how?

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:math="http://www.w3.org/2005/xpath-functions/math"
  exclude-result-prefixes="xs math"
  version="3.0">
  
  <xsl:mode streamable="yes" on-no-match="shallow-skip"/>
  <xsl:mode name="insideDoc" streamable="no" on-no-match="shallow-copy "/>
  
  <xsl:template match="Export">
    <xsl:element name="Exp">
      <xsl:apply-templates></xsl:apply-templates>
    </xsl:element>
  </xsl:template>
  
    <xsl:template match="PrintDate">
      <!-- how to keep/store this PrintDate, so we can add it to the Doc elements as an attribute later... -->
    </xsl:template>

  <xsl:template match="ExportContent">
    <xsl:apply-templates select="copy-of(Document)" mode="insideDoc"/>
  </xsl:template>
  
  <xsl:template match="Document" mode="insideDoc">
    <xsl:element name="Doc">
      <xsl:attribute name="PrintDate" select="'???'"/>
      <xsl:apply-templates select="*" mode="insideDoc"/>      
    </xsl:element>
  </xsl:template>
  
  <xsl:template match="LotsOfXml" mode="insideDoc">
    <xsl:element name="LotsOfTransformedXML">
    </xsl:element>
  </xsl:template>
  
</xsl:stylesheet>

Upvotes: 0

Views: 60

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167716

Use a capturing accumulator https://www.saxonica.com/html/documentation12/extensions/attributes/capture.html e.g.

<xsl:mode streamable="yes" on-no-match="shallow-skip" use-accumulators="PrintDate"/>

<xsl:accumulator name="PrintDate" as="element(PrintDate)*" initial-value="()" streamable="yes">
  <xsl:accumulator-rule saxon:capture="yes" match="ExportHeader/PrintDate"
    phase="end" select="$value, ."/>
</xsl:accumulator>

Declare xmlns:saxon="http://saxon.sf.net/". ...

<xsl:attribute name="PrintDate" select="accumulator-before('PrintDate')"/>

Upvotes: 1

Related Questions