MrD at KookerellaLtd
MrD at KookerellaLtd

Reputation: 2807

Strict compilation of XSLT, but without validating the input XML on application

I have a schema (as per previous questions, but this time it has some "required" attributes on FILLEDSQUARETYPE).

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
    xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" vc:minVersion="1.1">

  <xs:complexType name="SQUARETYPE">
    <xs:sequence>
      <xs:element name="contains">
        <xs:complexType>
          <xs:choice minOccurs="0" maxOccurs="unbounded">
            <xs:element ref="SQUARE"/>
            <xs:element ref="TRIANGLE"/>
          </xs:choice>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
    <xs:attribute name="kind"/>
    <xs:attribute name="width" type="xs:int"/>
    <xs:attribute name="x" type="xs:int"/>
    <xs:attribute name="y" type="xs:int"/>
  </xs:complexType>
  <xs:complexType name="FILLEDSQUARETYPE">
    <xs:sequence>
      <xs:element name="contains">
        <xs:complexType>
          <xs:choice minOccurs="0" maxOccurs="unbounded">
            <xs:element ref="SQUARE"/>
            <xs:element ref="TRIANGLE"/>
          </xs:choice>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
    <xs:attribute name="kind"/>

    <xs:attribute name="colour" type="xs:string" use="required"/>
    <xs:attribute name="width" type="xs:int"  use="required"/>
    <xs:attribute name="x" type="xs:int"  use="required"/>
    <xs:attribute name="y" type="xs:int"  use="required"/>
  </xs:complexType>
  <xs:complexType name="TRIANGLETYPE">
    <xs:sequence>
      <xs:element name="contains">
        <xs:complexType>
          <xs:choice minOccurs="0" maxOccurs="unbounded">
            <xs:element ref="SQUARE"/>
            <xs:element ref="TRIANGLE"/>
          </xs:choice>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
    <xs:attribute name="rotation" type="xs:int"/>
    <xs:attribute name="x" type="xs:int"/>
    <xs:attribute name="y" type="xs:int"/>
  </xs:complexType>
  <xs:element name="SQUARE">
    <xs:alternative test="@kind = 'FILLEDSQUARETYPE'" type="FILLEDSQUARETYPE"/>
    <xs:alternative test="@kind = 'SQUARETYPE'" type="SQUARETYPE"/>
    <xs:alternative type="xs:error"/>
  </xs:element>
  <xs:element name="TRIANGLE">
    <xs:alternative type="TRIANGLETYPE"/>
  </xs:element>
  <xs:element name="rootShape">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="SQUARE"/>
        <xs:element ref="TRIANGLE"/>
      </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>

I have an XSLT that I can compile against this XSD, with 0 warnings. Note it uses "element(tag,type)"

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt"
    exclude-result-prefixes="xs msxsl"
    version="2.0">

  <xsl:output method="xml" indent="yes" encoding="UTF-8" version="1.0"/>

  <xsl:import-schema schema-location="MessingAbout.xsd"/>
  <xsl:template match="/">
    <xsl:apply-templates select="SQUARE"/>
  </xsl:template>
  <xsl:template match="element(SQUARE,FILLEDSQUARETYPE)">
    <foo>
      <xsl:value-of select="@colour"/>
    </foo>
  </xsl:template>
</xsl:stylesheet>

I can apply that transform to xmls like this.

<?xml version="1.0" encoding="utf-8" ?>
<SQUARE x="1" y="2" width="234" kind="FILLEDSQUARETYPE" colour="red">
  <contains/>
</SQUARE>

and get

<?xml version="1.0" encoding="UTF-8"?>
<foo>red</foo>

HURRAY!

but...this is a simplification of my "real world" scenario. In my real world scenario the source system is optimised to only export XML that's required by the transform, (and the schema describes a utopian world where all data is exported, with mandatory data, even if thats not required)

so, for example, the attribute "width", "x", "y" arent required by the transform, so would be excluded...like this..

<?xml version="1.0" encoding="utf-8" ?>
<SQUARE kind="FILLEDSQUARETYPE" colour="red">
  <contains/>
</SQUARE>

If we now apply the transformation (with SchemaValidationMode.Strict) to this XML then saxon will complain.

It will automatically validate the input.

Validation error on line 2 column 46
  FORG0001: Required attribute @Q{}y is missing on element <SQUARE>
  Validating /SQUARE[1]
  See http://www.w3.org/TR/xmlschema11-1/#cvc-complex-type clause 4
Validation error on line 2 column 46
  FORG0001: Required attribute @Q{}x is missing on element <SQUARE>
  Validating /SQUARE[1]
  See http://www.w3.org/TR/xmlschema11-1/#cvc-complex-type clause 4
Validation error on line 2 column 46
  FORG0001: Required attribute @Q{}width is missing on element <SQUARE>
  Validating /SQUARE[1]
  See http://www.w3.org/TR/xmlschema11-1/#cvc-complex-type clause 4
Validation error on line 4 column 10
  XTTE1510: Three validation errors were reported. First error: Required attribute @Q{}y is
  missing on element <SQUARE>

Even though these errors are irrelevant to my XSLT, in my scenario this is a bit of an issue, ideally I'd like to turn off this behaviour, so that saxon doesnt try to validate things that are irrelevant to the execution of the XSLT.

Any ideas?

(I can obviously create a schema for the subset of data that IS exported, but this is actually quite onerous, and has nasty implications where effectively multiple types need to exist to describe effectively the same utopian data, when different subsets of children are exported. I can also turn everything to optional, but this massively diminishes the value of the type check).

Upvotes: 0

Views: 243

Answers (1)

Michael Kay
Michael Kay

Reputation: 163595

The whole idea of telling the XSLT compiler about the schema is so that it knows what to expect when it sees the data; the compiler can generate code that makes assumptions about what the data will be like. If the data doesn't conform to the schema, then that negates the whole idea.

It's hard to be specific about exactly what would go wrong if invalid data were accepted, but the XSLT optimiser makes a lot of use of schema knowledge. To take a simple example, if your stylesheet does <xsl:if test="exists(*)">, and the schema says the element will always have children, then the XSLT processor might well have optimised that to "if true".

Upvotes: 0

Related Questions