nine9ths
nine9ths

Reputation: 795

Using xsl:accumulator with xsl:try/xsl:catch

I have very large input document (thousands of Records) that has a structure something like this (Data represents many child elements):

<Input>
  <Record id="1">
    <Data/>
  </Record>
  <Record id="2">
    <Data/>
  </Record>
  <Record id="3">
    <Data/>
  </Record>
  <Record id="4">
    <Data/>
  </Record>
  <Record id="5">
    <Data/>
  </Record>
  <Record id="6">
    <!-- This is bad data -->
    <BadData/>
  </Record>
  <Record id="7">
    <Data/>
  </Record>
  <Record id="8">
    <Data/>
  </Record>
  <Record id="9">
    <!-- Also bad data -->
    <BadData/>
  </Record>
</Input>

I'm processing it with a stylesheet that performs a complex transform on each Record which could run into many dynamic errors. In this application if a few records have bad data I would prefer not to halt the transform but I would like to know about the errors so I can fix them later. I'm using an xsl:try/xsl:catch to allow the processing to continue:

<xsl:stylesheet
  version="3.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:err="http://www.w3.org/2005/xqt-errors"
  exclude-result-prefixes="xs err">

  <xsl:output indent="yes"/>

  <xsl:strip-space elements="*"/>

  <xsl:template match="Input">
    <Output>
      <xsl:apply-templates/>
    </Output>
  </xsl:template>

  <xsl:template match="Record">
    <xsl:variable name="preprocessed" as="element(GoodData)?">
      <xsl:try>
        <xsl:apply-templates mode="preprocess" select="."/>
        <xsl:catch>
          <xsl:message expand-text="yes">Couldn't create good data for {@id} Code: {$err:code} {$err:description}</xsl:message>
        </xsl:catch>
      </xsl:try>
    </xsl:variable>
    <!-- Do some more logic on the preprocessed record -->
    <xsl:if test="$preprocessed">
      <NewRecord id="{@id}">
        <xsl:sequence select="$preprocessed"/>
      </NewRecord>
    </xsl:if>
  </xsl:template>



  <xsl:template mode="preprocess" match="Record">
    <!-- This represents a very complex transform with many potential dynamic errors -->
    <xsl:variable name="source" as="element(Data)" select="*"/>
    <xsl:if test="$source">
      <GoodData/>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

This works fine, but it's a pain to dig through the large input documents to find the few records that failed. What I'd like to do is write the source of the Record elements that fail to a new Input document using xsl:result-document. I'm trying to add an xsl:accumulator something like this:

<xsl:accumulator name="failed-source" initial-value="()" as="element(Record)*">
  <xsl:accumulator-rule match="Record" phase="end">
    <xsl:sequence select="$value, .[false()(:test for failure:)]"/>
  </xsl:accumulator-rule>
</xsl:accumulator>

<xsl:template match="Input">
  <Output>
    <xsl:apply-templates/>
  </Output>
  <xsl:if test="accumulator-after('failed-source')">
    <xsl:result-document href="failed.input.xml">
      <Input>
        <xsl:sequence select="accumulator-after('failed-source')"/>
      </Input>
    </xsl:result-document>
  </xsl:if>
</xsl:template>

However, I can't figure out what the predicate in the xsl:accumulator-rule should be, or if it's even possible to use this pattern. Can a single result document be created without restructuring the stylesheet?

NB: I'm aware of the following solution, but it wasn't my first choice because it seems like it could potentially have much higher memory requirements, but perhaps that isn't true. I could also write all the Records out to individual files but I consider this dangerous because one source document might generate thousands of failures.

<xsl:template match="Input">
  <xsl:variable name="processed" as="document-node()">
    <xsl:document>
      <xsl:apply-templates/>
    </xsl:document>
  </xsl:variable>
  <xsl:if test="$processed/NewRecord">
    <Output>
      <xsl:sequence select="$processed/NewRecord"/>
    </Output>
  </xsl:if>
  <xsl:if test="$processed/Record">
    <xsl:result-document href="failed.input.xml">
      <Input>
        <xsl:sequence select="$processed/Record"/>
      </Input>
    </xsl:result-document>
  </xsl:if>
</xsl:template>

<xsl:template match="Record">
  <xsl:variable name="preprocessed" as="element(GoodData)?">
    <xsl:try>
      <xsl:apply-templates mode="preprocess" select="."/>
      <xsl:catch>
        <xsl:message expand-text="yes">Couldn't create good data for {@id} Code: {$err:code} {$err:description}</xsl:message>
      </xsl:catch>
    </xsl:try>
  </xsl:variable>
  <!-- Do some more logic on the preprocessed record -->
  <xsl:choose>
    <xsl:when test="$preprocessed">
      <NewRecord id="{@id}">
        <xsl:sequence select="$preprocessed"/>
      </NewRecord>
    </xsl:when>
    <xsl:otherwise>
      <xsl:sequence select="."/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

Upvotes: 1

Views: 987

Answers (1)

Michael Kay
Michael Kay

Reputation: 163595

It's an interesting approach.

The value of an accumulator must always be a pure function of an input node. There's no way of feeding in information from other activities, e.g. whether the processing of the node failed. It's not clear to me whether you can detect the "bad records" independently from the processing that you carry out on those records: if you can, that is, if you are essentially doing custom validation on the input, then this pattern might work quite well. (But in that case, I don't think you would be doing try/catch. Rather, your main processing function would first check the accumulator to see if the data is valid.)

Note that the spec for accumulators allows the computation of one accumulators to access other accumulators, but this is not currently implemented in Saxon.

I think the more usual way of tackling this is probably to write the results of successful processing and the reports of unsuccessful processing to the same result tree, and then split this in a subsequent transformation pass. Unfortunately XSLT 3.0 streaming capabilities don't have anything to offer in the area of multi-pass processing. For the splitting pass, however, xsl:fork might well be suitable.

Upvotes: 2

Related Questions