Nic Wortel
Nic Wortel

Reputation: 11423

How do I extend the Atom schema?

I am designing an XML product feed that will be used by a number of web shops in order to publish their product data. The structure of this product feed will be based on the Atom XML standard, similar to Google's Atom product feed. I will publish an XSD file that can be used to validate product feeds.

Basically, each <entry> element will represent a product. I will need to add some child elements to the <entry> element, which will contain data such as the product price, shipping costs, etc.

The problem lies with creating the XSD file. I'm not sure how to extend the Atom standard in such a way that I can add child elements to an <entry>. Currently I'm simply defining the extra elements as top level elements, but this doesn't allow me to specify the occurrence indicators (minOccurs and maxOccurs).

What I want to do is to specify a number of elements that are required within each <entry> element. They can be new elements introduced by my schema (such as a <price> element that holds a product's price), as well as existing Atom elements (such as the <link> element, which is defined by Atom, but is not required).

Here is (a simplified version of) my current product-feed.xsd:

<?xml version="1.0" encoding="UTF-8"?>

<xs:schema 
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://example.com/schemas/product-feed"
    xmlns:p="http://example.com/schemas/product-feed"
    xmlns:atom="http://www.w3.org/2005/Atom"
    elementFormDefault="qualified">

  <xs:element name="brand" type="xs:string" />

  <xs:element name="price" type="p:money" />

  <xs:element name="shipping" type="p:money" />

  <xs:complexType name="money">
    <xs:simpleContent>
      <xs:extension base="xs:decimal">
        <xs:attribute name="currency" 
                      type="p:currency" 
                      use="required" />
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>

  <xs:simpleType name="currency">
    <xs:restriction base="xs:string">
      <xs:enumeration value="EUR" />
      <xs:enumeration value="USD" />
      <xs:enumeration value="GBP" />
    </xs:restriction>
  </xs:simpleType>

</xs:schema>

Here is an example xml feed:

<?xml version="1.0" encoding="UTF-8"?>

<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:p="http://example.com/schemas/product-feed"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

  <title>Example Store</title>
  <link href="http://www.example-store.com/" rel="self" />
  <updated>2014-08-08T10:44:20Z</updated>

  <entry>
    <title>Foo</title>
    <link href="http://www.example-store.com/products/foo.html" />
    <p:price currency="EUR">32.95</p:price>
    <p:shipping currency="EUR">6.75</p:shipping>
  </entry>

  <entry>
    <title>Bar</title>
    <link href="http://www.example-store.com/products/acme-bar.html" />
    <p:brand>Acme</p:brand>
    <p:price currency="EUR">12.50</p:price>
    <p:shipping currency="EUR">6.75</p:shipping>
  </entry>

</feed>

How can I extend the Atom schema in such a way that my custom elements are only allowed inside the <entry> element, and that I can define how many times they may occur?

The only alternative solution I can think of would be to duplicate an Atom schema definition file (such as this one), and to modify that (adding my own elements, and changing the Atom elements that I want to require). This doesn't feel very good (I wouldn't be extending Atom anymore, I would simply be creating a whole new schema) so I'm hoping for a better solution.

Upvotes: 2

Views: 1711

Answers (1)

C. M. Sperberg-McQueen
C. M. Sperberg-McQueen

Reputation: 25054

A full answer with explanations will require more time than I have available today, but the essentials of a solution can be sketched quickly.

First, the XSD you point to has a number of problems as a representation of the Atom format as described in RFC 4287.

  • It uses repeating choice-groups to represent Relax NG interleave groups, which means it does not enforce any of the cardinality constraints of the spec's RNG schema, in the Atom person construct, the atom:feed element, the atom:entry element, or atomSource.

In an XSD 1.1 schema, these are best represented using all-groups. In an XSD 1.0 schema, enforcing the minimum and maximum occurrence constraints on individual elements will require a rather verbose choice of sequences (themselves with nested choice groups), which is feasible but somewhat tedious to construct.

  • It uses a regular-expression pattern for email addresses which agrees neither with the prose of the Atom spec (which says the addresses must conform to RFC 2922) nor with the RNG schema (which uses the very simple expression ".+@.+".

Generating a regular expression which matches the addr-spec production of RFC 2822 and thus enforces the rules of the Atom spec is not possible. (The set of legal addr-spec values is context-free, not regular, because RFC 2822 comments nest.) Approximating it with a regular expression is possible, but a bit time-consuming and error-prone unless you do it systematically. The simplest solution is to follow the example of the RNG schema in the Atom spec and just require a string with at least one at-sign in it neither at the beginning nor at the end.

So your first step will be to create (or find) an XSD which does a better job of representing the Atom document grammar.

Your three choices are then:

  • Modify the schema document (adding comments to describe your changes and pointing future readers to the base schema document from which you started) to import your namespace and add the specific elements you want to the content model for atom:entry.

This has the advantage that it's simple to do. It has the disadvantage that the relation between your modification of the Atom language and its definition in the Atom spec is only as clear as your natural-language comments make it. You worry that you are not in that case really extending the Atom language; I think you are extending the Atom language, but you are right to notice that you are not doing so by extending a stand-alone schema for Atom. That probably counts as a disadvantage.

  • Use xsd:redefine to redefine the type of the atom:entry element, by restricting the wildcard to the elements you want to see. Your redefinition must be a valid restriction of the type as defined in the base schema (but XSD does not guarantee that a conforming redefinition of a conforming schema will be a conforming schema, which makes the constraints on redefinition seem rather pointless to some users).

This has the advantage of being conformant to XSD 1.0; it has the disadvantage of probably being rather tricky and error-prone (just last week, I heard a a prominent document designer say in public that he has never been able to get XSD redefinition to work). It also has the disadvantage that XSD 1.0 processors are known to have inconsistent implementations of redefine. (It should be noted however that the inconsistencies mostly show up in situations involving multiple redundant imports, includes, and redefinitions, and should not show up in the straightforward situation you describe).

  • If XSD 1.1 is available to you, use xsd:override to modify the content model of atom:entry as appropriate.

This has the advantage of keeping your changes separate from the base schema, and being simpler to specify than xsd:redefine. It does however require XSD 1.1, which may or may not be supported by the tool chain you hope to use.

Upvotes: 3

Related Questions