Michael Harness
Michael Harness

Reputation: 101

XSD, order doesn't matter, must account for xml changes

I've looked into several ways to do this and none seem to work for me.

I have the following xml example:

 <Entry>
      <Node1></Node1>
      <Node2></Node2>
      <Node3></Node3>
 </entry>

I need the xsd to account for the following changes without breaking:

1) New node added to the end of the xml:

 <Entry>
      <Node1></Node1>
      <Node2></Node2>
      <Node3></Node3>
      <Node4></Node4>
 </Entry>

2) New node added in the middle of the xml:

 <Entry>
      <Node1></Node1>
      <Node4></Node4>
      <Node2></Node2>
      <Node3></Node3>
 </Entry>

3) Combination of both:

 <Entry>
      <Node1></Node1>
      <Node2></Node2>
      <Node5></Node5>
      <Node3></Node3>
      <Node4></Node4>
 </Entry>

The XSD I'm currently using is build off of <xs:sequence> and obviously that is failing. I've tried <xs:any>, <xs:all>, <xs:choice> and none of them seem to validate properly.

The use case for this is as follows: If a developer updates and API that returns the aforementioned XML I do not want to have to create a new XSD and recompile an application to allow for the changes.

Any and all information is greatly appreciated.


Here is the xsd that I'm using for this.

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="APIName">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="entry" maxOccurs="unbounded" minOccurs="0">
          <xs:complexType>
              <xs:element type="xs:string" name="Node1"/>
              <xs:element type="xs:string" name="Node2"/>
              <xs:element type="xs:string" name="Node3"/>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
      <xs:attribute type="xs:string" name="NextPageLink"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

Upvotes: 2

Views: 1150

Answers (3)

Petru Gardea
Petru Gardea

Reputation: 21658

Strictly speaking, you did not provide enough information for someone to reliably answer your question, so I disagree with @Abel 's short answer.

Let's take this schema:

<?xml version="1.0" encoding="utf-8" ?>
<!-- XML Schema generated by QTAssistant/XSD Module (http://www.paschidev.com) -->
<xsd:schema targetNamespace="http://tempuri.org/XMLSchema.xsd" xmlns="http://tempuri.org/XMLSchema.xsd" elementFormDefault="qualified" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xtm="http://paschidev.com/schemas/metadata/xtm">
    <xsd:element name="root">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:any minOccurs="0" maxOccurs="unbounded" processContents="lax"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
    <xsd:element name="node1" type="xsd:string"/>
    <xsd:element name="node2" type="xsd:int"/>
    <xsd:element name="node3" type="xsd:date"/>
</xsd:schema>

The trick is the processContents="lax" used for an xsd:any. By looking at your test cases, the above would satisfy your requirement: node1, 2, and 3 would properly validate, in any order you might think of arranging them, interspersed or not with new or existing content. I have to warn you that certain type of changes would not work, e.g. changing the content model of a node (node2 from int to string, etc.)

A true answer would have to take into account many things, including the technology you're using to process your XML and how much validation you really want to do. For example, certain XSD-to-code binding technologies have built in support to ignore XML that is new, "appended" at the "end" of the content model defined by the version of the schema used to generate the code (.NET XML deserializers). Other stacks (e.g. JAXB) support the use of custom error handlers which allow users to control the unmarshalling of unknown content. An XPath based processing model may even be less sensitive, and allow you to even conduct selective validation, say at node level.

I would say the correct answer is: "it depends"; I do hope my sample schema and the technology references give you an idea of what else you might want to clarify, to help us give you a more precise answer.

Based on your update:

This schema:

<?xml version="1.0" encoding="utf-8" ?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="APIName">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="entry" maxOccurs="unbounded" minOccurs="0">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:any processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
            <xs:attribute type="xs:string" name="NextPageLink"/>
        </xs:complexType>
    </xs:element>
    <xs:element type="xs:string" name="Node1"/>
    <xs:element type="xs:string" name="Node2"/>
    <xs:element type="xs:string" name="Node3"/>
</xs:schema>

Would validate:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!-- Sample XML generated by QTAssistant (http://www.paschidev.com) -->
<APIName xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" NextPageLink="NextPageLink1">
    <entry>
        <Node2>s2</Node2>
        <Node5>s5</Node5>
        <Node1>s1</Node1>
        <Node3>s3</Node3>
    </entry>
</APIName>

And this:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!-- Sample XML generated by QTAssistant (http://www.paschidev.com) -->
<APIName xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" NextPageLink="NextPageLink1">
    <entry>
        <Node2>s2</Node2>
        <Node1>s1</Node1>
        <Node2>s2</Node2>
        <Node5>s5</Node5>
        <Node3>s3</Node3>
    </entry>
</APIName>

This too, your expectations as set by your schema will not be met, since it expects a Node3:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!-- Sample XML generated by QTAssistant (http://www.paschidev.com) -->
<APIName xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" NextPageLink="NextPageLink1">
    <entry>
        <Node1>s1</Node1>
        <Node2>s2</Node2>
        <Node5>s5</Node5>
    </entry>
</APIName>

This is where XSD 1.0 hits a limit. XSD 1.1 can help you for sure. If you can't use it, I would then still use XSD 1.0, and compensate with a quick check in code to make sure an instance for each of Node 1, 2 and 3 is there; beats all other recommendations in terms of work. Based on my personal experience, for what it can do, XSD would be more efficient than having to write equivalent code by hand.

Upvotes: 2

Abel
Abel

Reputation: 57169

The short answer is: this cannot be done in XSD.

The reason this cannot be done is because you would violate the Unique Particle Attribute constraint, which is an expensive way of saying that your data model is not deterministic.

Suppose you have an XSD that allows:

<Entry>
   <any node>
   <Node1>
   <any node>
</Entry>

and you only want to require that Node1 is there. Suppose now that someone sends you:

<Entry>
   <Node1>
   <Node2>
   <Node1>
</Entry>

How is the data-mapping going to determine whether the first or the second Node1 is the right one?

You may argue that you want to require the other elements to be of different names, but such a constraint is not possible (granted, technically it is, but there is no such mechanism in XSD), unless you design for extensibility, which requires the extra elements to be in a different namespace. XFront has an excellent article on some methods to create such an extensibility and another article on Variable Content Containers, which is the way I would choose in your case. They are good reads, I can highly recommend them.

So what can you do?

  1. Make sure that the model you are going to conceive is deterministic
    1. I.e., you can demand that all non-conformant elements are at the beginning, or at the end, or in the middle
    2. This would require the elements that you expect there, to have minOccurrence higher than zero.
  2. Choose another schema language, what you want can be expressed in RelaxNG or Schematron
  3. Get rid of auto-mapping and construct your schema with XSD 1.1 using xs:assert, which allows the validation the way you want it, but will require you to create the model-to-object mapping by hand (after all, it becomes non-deterministic, so no program in the world can map it automatically for you)
  4. Go the "XSD way" and use one of the best-practices in the links above. This will require people to extend by using other namespaces only (this is definitely a better idea anyway!), but you will still be able to validate and map the XSD.
  5. Simplify your model, add an optional element extensions with content model xs:any which will require your data providers to put any extra stuff in there.

Upvotes: 3

Armando SM
Armando SM

Reputation: 142

The nodes need to have that name? I would suggest have node as a element and specify the index or number as an attribute or subelement

<Entry> <Node order="1"></Node1> <Node order="2"></Node2> <Node order="5"></Node5> <Node order="3"></Node3> <Node order="4"></Node4> </Entry>

or

<Entry> <Node><order>1</order></Node> <Node><order>2</order></Node> <Node><order>5</order></Node> <Node><order>3</order></Node> <Node><order>4</order></Node> </Entry>

Upvotes: -1

Related Questions