Sean Champ
Sean Champ

Reputation: 361

Querying XSD-valid XML for the original XML schema

Given a schema document (XSD foramt) such as the MODS 3.5 schema (US Library of Congress, LoC), and a document (XML) known to be valid according to that schema, such as the metadata for the Antitrust & Competition Policy Blog archives 2007 (HTML view) from the LoC Law Blawgs Web Archive, is there a Java API such that would allow a Java program to query the XML document for the XML schema data types that elements of the document are instances of?

It may seem as though I have may XML schemas and UML models confused. I'm thinking of an XML schema as it representing something like a UML model (M1), and an XML document then, like user data (M0) representing instances of UML model elements. If it may be possible, similarly, to query an XML element, to determine the XML schema data type or element definition that the element either derives from or is conformant to in the parse tree, I've thought it could make for a nice feature for a sequencer for ModeShape.

I think, the idea is essentially: That it may be possible to reference the JCR nodes representing XML elements of a sequenced XML document, in a ModeShape JCR repository, to reference each element to a JCR node representing an XML schema data type, such the type's representative JCR node would be defined in the sequencing of the schema used by the document, such as would have been sequenced by the ModeShape XSD sequencer.

I'm simply not certain if there may be an API, in Java, for determining the XML schema element than a valid XML document element -- when the XML document is validated according to an XML schema -- such that the element is conformant to in the parse tree. I'm of an impression that it would be possible to perform such a computation. Simply, I wonder, might there already be an API for that?

Alternately, there is UML...

Upvotes: 0

Views: 222

Answers (1)

Michael Kay
Michael Kay

Reputation: 163418

The answer is yes.

In terms of standards, validating an XML document against a schema produces a PSVI, (post schema validation infoset), and the PSVI decorates nodes in the parse tree with information about what types they were validated against.

In terms of concrete implementation, if you use the JAXP Validation API you can either generate a DOM augmented with TypeInfo that tells you the type of each node, or you can use a SAX-based validation pipeline in which type information is notified to a TypeInfoProvider.

You can also do this using schema-aware XSLT and XQuery; after a validation operation, nodes are augmented with a "type annotation", which you can interrogate using the "instance of" test. If you use Saxon, you can use the extension functions saxon:type() or saxon:type-annotation() to explore further:

http://www.saxonica.com/documentation/#!functions/saxon/type http://www.saxonica.com/documentation/#!functions/saxon/type-annotation

A limitation of the XSLT/XQuery approach is that it only works if validation succeeds. The DOM/SAX interfaces also provide information in cases where validation fails.

Upvotes: 1

Related Questions