Vokail
Vokail

Reputation: 691

Java Saxon valid subtree of an xsd

using Saxon-HE I want to parse an xsd file, get the resulting tree and from a given element name, get the resulting subtree, with all required simpleType and complexType (both using type reference and from ref), for example parse a file like:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="PSS" xmlns=""
    xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element maxOccurs="unbounded" name="Assistito">
        <xs:complexType>
            <xs:sequence>
                <xs:element minOccurs="0" name="IDCittadino" type="IDCittadino"/>
                <xs:element maxOccurs="unbounded" name="Struttura">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element name="CodiceStruttura" type="CodiceStruttura"/>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
    <xs:simpleType name="IDCittadino">
        <xs:restriction base="xs:string">
            <xs:minLength value="20"/>
            <xs:maxLength value="32"/>
        </xs:restriction>
    </xs:simpleType>   
    <xs:simpleType name="CodiceStruttura">
        <xs:restriction base="xs:string">
            <xs:minLength value="1"/>
            <xs:maxLength value="8"/>
        </xs:restriction>
    </xs:simpleType>   
</xs:schema>

And I need to get the subtree from element with name = "Struttura", that require also to get simpleType with name = "CodiceStruttura", like for example:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="PSS" xmlns=""
    xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="Struttura">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="CodiceStruttura" type="CodiceStruttura"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
    <xs:simpleType name="CodiceStruttura">
        <xs:restriction base="xs:string">
            <xs:minLength value="1"/>
            <xs:maxLength value="8"/>
        </xs:restriction>
    </xs:simpleType>   
</xs:schema>

Notes

thanks in advance,

Upvotes: 1

Views: 354

Answers (2)

Michael Kay
Michael Kay

Reputation: 163262

Generally I would not advise working from the raw XML of a schema document, I would recommend working from the schema component model produced by a schema compiler, because otherwise you will find yourself either (a) replicating all the work done by the schema compiler, or (b) getting it wrong, and not handling all schemas correctly.

There are a number of ways you could get programmatic access to the schema component model. Saxon offers two approaches (but both need Saxon-EE). (a) you can output an XML representation of the schema component model using Saxon's schema processor (use com.saxonica.Validate -xsd:schema.xsd -scmout:schema.scm). (b) you can access the schema component model from XSLT or XQuery using a set of extension functions, starting with saxon:schema().

A an alternative, the Xerces schema processor offers a Java API to its internal schema component model, and you may be able to access this API using Xalan (or indeed Saxon) extension functions.

In all these cases the "schema component model" that is made available is very close to the abstract schema component model described in the W3C XSD specifications. The key differences from working with a raw schema document include (a) all imports, includes, and cross-component references have been resolved; (b) all defaults have been expanded; (c) model groups and attribute groups have been expanded.

Upvotes: 2

Martin Honnen
Martin Honnen

Reputation: 167436

I am not sure Saxon 9 HE without any schema parsing/schema based XSLT or XQuery processing or schema object model is a suitable tool for that task, the only thing it has is XSLT 3 or XQuery 3.1 so you would need to use the features of a language like XSLT 3 and xsl:key to follow any type or ref cross-references.

Trying to implement that for type I have created

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="3.0">

  <xsl:param name="element-to-search" as="xs:QName" select="xs:QName('Struttura')"/>

  <xsl:key name="element-ref" match="xs:element" use="resolve-QName(@name, .)"/>

  <xsl:key name="type-ref" match="xs:complexType | xs:simpleType" use="resolve-QName(@name, .)"/>

  <xsl:variable name="start-element" select="key('element-ref', $element-to-search)"/>

  <xsl:strip-space elements="*"/>
  <xsl:output indent="yes"/>

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:mode name="flatten" on-no-match="shallow-skip"/>

  <xsl:template match="/*">
      <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:apply-templates select="$start-element"/>
          <xsl:apply-templates select="$start-element//xs:element" mode="flatten"/>
      </xsl:copy>
  </xsl:template>

  <xsl:template match="xs:element[@type and key('type-ref', resolve-QName(@type, .))]" mode="flatten">
      <xsl:apply-templates select="key('type-ref', resolve-QName(@type, .))"/>
  </xsl:template>

</xsl:stylesheet>

which for your sample input creates (online example at https://xsltfiddle.liberty-development.net/eiZQaFw) the output

<xs:schema xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           id="PSS">
   <xs:element maxOccurs="unbounded" name="Struttura">
      <xs:complexType>
         <xs:sequence>
            <xs:element name="CodiceStruttura" type="CodiceStruttura"/>
         </xs:sequence>
      </xs:complexType>
   </xs:element>
   <xs:simpleType name="CodiceStruttura">
      <xs:restriction base="xs:string">
         <xs:minLength value="1"/>
         <xs:maxLength value="8"/>
      </xs:restriction>
   </xs:simpleType>
</xs:schema>

So that approach has found the referenced simple type and added it, with a separate key for ref attributes you might be able to add them as well and it might be doable (perhaps with some recursion) if you have a single schema module without includes and imports and the more advanced schema features Michael Kay has mentioned. Whether it will do any robust processing for schemas I am not sure, the schema language is complex and simply copying as you asked for also creates problems (as you can see above the maxOccurs="unbounded" has been copied by the simple approach and you would need to add a template suppressing it for anything you put at the top level).

I am also not sure whether it is not easy to construct something invalid by pulling up inlined elements to the top-level, what happens if there is an inline, locally scoped element foo you choose as the starting point but a globally declared and elsewhere referenced foo element exists as well?

Upvotes: 1

Related Questions