AndyZ
AndyZ

Reputation: 615

XSD unique contraint based on multiple elements including an optional element

I have defined a unique constraint on multiple elements : define unique constraint based on multiple elements

Now unique constraint looks like this:

<xs:unique name="specieSizeGroupLengthAssortment">
    <xs:selector xpath="DataRow"/>
    <xs:field xpath="Specie"/>
    <xs:field xpath="Group"/>
    <xs:field xpath="Length"/>
    <xs:field xpath="Type"/>
</xs:unique>   

Now imagine the element "Type" is optional. So far my search and my testing confirmed, that this unique constraint only works on elements which have all the subelements defined in the unique constraint. For example:

This should be invalid due to unique constraint:

<DataRow>
   <Specie>A</Specie>
   <Length>100</Length>
   <Group>A</Group>
</DataRow>

<DataRow>
   <Specie>A</Spacie>
   <Length>100</Length>
   <Group>A</Group>
</DataRow>

This should be valid :

<DataRow>
   <Specie>A</Specie>
   <Length>100</Length>
   <Group>A</Group>
</DataRow>

<DataRow>
   <Specie>A</Spacie>
   <Length>100</Length>
   <Group>A</Group>
   <Type>D</Type>
</DataRow>

This should be invalid :

<DataRow>
   <Specie>A</Specie>
   <Length>100</Length>
   <Group>A</Group>
   <Type>D</Type>
</DataRow>

<DataRow>
   <Specie>A</Spacie>
   <Length>100</Length>
   <Group>A</Group>
   <Type>D</Type>
</DataRow>

Is it possible to create an XSD schema that will do this kind of validation?

Upvotes: 3

Views: 3521

Answers (2)

Michael Kay
Michael Kay

Reputation: 163468

I think I got it the wrong way around. Unique constraints allow absent fields; Key constraints do not.

The language is very obscure, but is easier to understand in the XSD 1.1 version because some notes have been added. I don't think there is any (intentional) change in functionality between the two versions.

  • The {selector}, with the element information item as the context node, evaluates to a node-set (as defined in [XPath]). [Definition:] Call this the target node set.

  • Call the subset of the ·target node set· for which all the {fields} evaluate to a node-set with exactly one member which is an element or attribute node with a simple type the qualified node set.

So if some selected node has a missing value for one of its fields, then this node is not part of the qualified node-set.

  • If the {identity-constraint category} is unique, then no two members of the ·qualified node set· [may] have ·key-sequences· whose members are pairwise equal, as defined by Equal in [XML Schemas: Datatypes].

This means that for "unique", selected nodes for which a field is absent are simply ignored.

  • If the {identity-constraint category} is key, then all of the following must be true: 4.2.1 The ·target node set· and the ·qualified node set· are equal, that is, every member of the ·target node set· is also a member of the ·qualified node set· and vice versa.

This means that for "key", the data is invalid if one of the fields is missing.

I'm left concluding that the original schema as posted almost does what is required, except that the first example is not invalid: for both selected nodes there is a field missing, therefore neither selected node is included in the qualified node-set, therefore the unique constraint has no effect. To make this invalid, you will need a second "unique" constraint that only lists the first three fields. But then you will get a validity error if these three fields are the same, even if the fourth field is present.

In XSD 1.1 of course you can solve the problem with an assertion, along the lines

test="count(DataRow) = count(distinct-values(DataRow/concat(
                         Specie, '|', Length, '|', Group, '|', Type)))

Upvotes: 2

Ian Roberts
Ian Roberts

Reputation: 122394

The specification states that each field in a unique constraint

must identify a single node (element or attribute) whose content or value, which must be of a simple type, is used in the constraint.

XML Schema part 1: Structures, §3.11.1, my bold.

So it appears that you can't use optional elements in a uniqueness constraint. This is backed up by the step-by-step rules for validating these constraints (§3.11.4):

3 For each node in the ·target node set· all of the {fields}, with that node as the context node, evaluate to either an empty node-set or a node-set with exactly one member, which must have a simple type. [Definition:] Call the sequence of the type-determined values (as defined in [XML Schemas: Datatypes]) of the [schema normalized value] of the element and/or attribute information items in those node-sets in order the key-sequence of the node.
4 [Definition:] Call the subset of the ·target node set· for which all the {fields} evaluate to a node-set with exactly one member which is an element or attribute node with a simple type the qualified node set. The appropriate case among the following must be true:
4.1 If the {identity-constraint category} is unique, then no two members of the ·qualified node set· have ·key-sequences· whose members are pairwise equal, as defined by Equal in [XML Schemas: Datatypes].
[...]

This explicitly defines the uniqueness check as applying only to the "qualified node set", i.e. those nodes matching the selector which have values for all their fields

Upvotes: 3

Related Questions