Jerome
Jerome

Reputation: 61

Any limitation with Saxon-EE XSLT v3 Streaming?

I want to apply different tansformations to a big XML document using the Saxon XSLT3 streaming capabilities. The problem that I'm facing is that, if I apply this transformation it does not work:

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="ano contextutil"  xmlns:ano="java:StreamingGenericProcessor"
  xmlns:contextutil="java:GenericAnonymizerContextUtil">
 <xsl:mode streamable="yes"/>
 <xsl:output method="xml"/>
 <xsl:param name="context" as="class:java.lang.Object" xmlns:class="http://saxon.sf.net/java-type"/>
 <xsl:template match="internal/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="email/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="address/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="birthday/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="country/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="external/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="name/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="phone/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="city/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="id/text()"><xsl:value-of select="ano:uuid($context, current(), 'ID')"/></xsl:template>
 <xsl:template match="." >
   <xsl:copy validation="preserve">
     <xsl:apply-templates select="@*" />
     <xsl:apply-templates select="node()" />
   </xsl:copy>
 </xsl:template>
 </xsl:stylesheet>

But with this one it does:

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="ano contextutil"  xmlns:ano="java:StreamingGenericProcessor"
  xmlns:contextutil="java:GenericAnonymizerContextUtil">
 <xsl:mode streamable="yes"/>
 <xsl:output method="xml"/>
 <xsl:param name="context" as="class:java.lang.Object" xmlns:class="http://saxon.sf.net/java-type"/>
 <xsl:template match="email/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="address/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="birthday/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="country/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="external/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="name/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="phone/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="city/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="id/text()"><xsl:value-of select="ano:uuid($context, current(), 'ID')"/></xsl:template>
 <xsl:template match="." >
   <xsl:copy validation="preserve">
     <xsl:apply-templates select="@*" />
     <xsl:apply-templates select="node()" />
   </xsl:copy>
 </xsl:template>
 </xsl:stylesheet>

I tested plenty of different scenarios and I concluded that if I have more than 9 "xsl:template" it does not work!

EDIT: it does not work means: on a specific tag named "id" I'm applying a java function. If I have more than 9 "xsl:template", the output is not modified and my java function is not called at all. I have no error message

EDIT2: If I replace the call to the java function with, for instance, "concat(current(), '_ID')", I have the same behaviour so this is not specific to the java function all.

EDIT3:

Here is a sample input data:

<?xml version="1.0" encoding="UTF-8"?>
<table>
  <row>
    <id>10</id>
    <email>[email protected]</email>
    <address>dsffe</address>
    <birthday>10/2018</birthday>
    <country>FR</country>
    <external>zz</external>
    <internal>ww</internal>
    <name>Jean</name>
    <phone>000000</phone>
    <city>Dfegd</city>
  </row>
  <row>
    <id>9</id>
    <email>[email protected]</email>
    <address>sdfzefzef</address>
    <birthday>11/2012</birthday>
    <country>GB</country>
    <external>xx</external>
    <internal>yy</internal>
    <name>Jean-Claude</name>
    <phone>000000</phone>
    <city>dd</city>
  </row>

This xsl which always works:

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
 <xsl:mode streamable="yes"/>
 <xsl:output method="xml"/>
 <xsl:template match="email/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="address/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="birthday/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="country/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="external/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="name/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="phone/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="city/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="id/text()"><xsl:value-of select="concat(current(), '_ID')"/></xsl:template>
 <xsl:template match="." >
   <xsl:copy validation="preserve">
     <xsl:apply-templates select="@*" />
     <xsl:apply-templates select="node()" />
   </xsl:copy>
 </xsl:template>
 </xsl:stylesheet>

The problematic one (the same xsl with one more template):

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
 <xsl:mode streamable="yes"/>
 <xsl:output method="xml"/>
 <xsl:template match="email/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="address/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="birthday/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="country/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="external/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="internal/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="name/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="phone/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="city/text()"><xsl:value-of select="current()"/></xsl:template>
 <xsl:template match="id/text()"><xsl:value-of select="concat(current(), '_ID')"/></xsl:template>
 <xsl:template match="." >
   <xsl:copy validation="preserve">
     <xsl:apply-templates select="@*" />
     <xsl:apply-templates select="node()" />
   </xsl:copy>
 </xsl:template>
 </xsl:stylesheet>

I run with the following command line:

java -cp Saxon-EE-9.8.0-14.jar  net.sf.saxon.Transform -s:test.xml -xsl:concat_not_working.xsl

The working XSL properly append _ID to the output id tag value whereas the not working xsl does not do any transformation.

Another information, if I run without the license (so without streaming), both stylesheets work!

I'm using Saxon-EE 9.8.0-14 with a trial license: could it be a non documented trial license limitation ?

Upvotes: 1

Views: 245

Answers (1)

Michael Kay
Michael Kay

Reputation: 163458

Your theory that the failure occurs with 10 or more rules turns out to be spot on. When there are more than 10 rules matching the same node-kind/node-name combination (in this case, all text nodes), Saxon-EE attempts to avoid a linear search of all the rules by looking for criteria that subsets of the rules share in common. In this case it is looking to see whether it can group the rules according to a precondition based on the parent of the text node.

At this stage there is a flaw in the logic; it carefully works out that each rule is in a group of 1 (no two parent conditions are the same), which should mean that it then abandons the optimization attempt. But it doesn't abandon it; it carries on. This shouldn't matter, because the optimization should work correctly even though it was pointless.

The reason the optimization isn't working correctly is because on the streaming path for xsl:apply-templates, the context data for evaluating the rule preconditions isn't being initialized properly, leading the rule matcher to think that the preconditions aren't satisfied.

So you've hit a bug that, as you surmised, applies when you have a set of 10 or more template rules in a streaming mode when the rules all match nodes that have the same node-kind and node-name.

Running unlicensed bypasses the bug for two reasons: it deactivates the optimization of rule chains, and it deactivates streaming.

As a workaround, simply remove the /text() from each of your template rules.

Logged as a bug here: https://saxonica.plan.io/issues/3901

Unless you indicate otherwise, I will submit a new test case based on your test data and stylesheet to the W3C test suite for XSLT 3.0.

Upvotes: 1

Related Questions