Vladimir Bashkirtsev
Vladimir Bashkirtsev

Reputation: 1354

xmllint fails to validate xs:language

I am trying to validate the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dc="http://dublincore.org/schemas/xmls/qdc/2008/02/11/dc.xsd">
  <xs:element name="feature">
    <xs:simpleType>
      <xs:restriction base="xs:string">
        <xs:annotation>
          <xs:documentation xml:lang="de-x-mt">
            <dc:title xml:lang="de-x-mt">Berg</dc:title>
            <dc:title xml:lang="en-x-mt">Mountain</dc:title>
          </xs:documentation>
        </xs:annotation>
      </xs:restriction>
    </xs:simpleType>
  </xs:element>
</xs:schema>

using following xmllint

xmllint: using libxml version 20901
   compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib Lzma

xmllint says:

test.xsd:7: element documentation: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema}documentation', attribute '{http://www.w3.org/XML/1998/namespace}lang': 'de-x-mt' is not a valid value of the atomic type 'xs:language'.
WXS schema test.xsd failed to compile

I cannot figure out why "de-x-mt" is not valid "xs:language" in xs:documentation element while the same "de-x-mt" is valid in dc:title element. Both of them are from xml namespace and should be treated the same way. It is actually the same attribute! Is it?!

As per "W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes" xs:language defined as:

the set of all strings that conform to the pattern

[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*

Clearly de-x-mt matches this pattern.

Is it xmllint bug? How to get xmllint to validate such xs:language tags?

Upvotes: 1

Views: 490

Answers (1)

Vladimir Bashkirtsev
Vladimir Bashkirtsev

Reputation: 1354

Digging deep in libxml2 (one which is used by xmllint) I found in parser.c comment for the function xmlCheckLanguageID():

The parser below doesn't try to cope with extension or privateuse
that could be added but that's not interoperable anyway

Reading below there's a code:

if (nxt - cur == 4)
     goto script;
if (nxt - cur == 2)
     goto region;
if ((nxt - cur >= 5) && (nxt - cur <= 8))
     goto variant;
if (nxt - cur != 3)
     return(0);

As you can see there no allowance for singleton. Basically libxml2's xmlCheckLanguageID does not conform to the standard in the way of extension or privateuse singletons.

To correct the issue we need to have following code:

  if (nxt - cur == 1)
       goto extension;

and down below:

    /* extensions and private use subtags not checked */
extensions:
    return (1);

I have submitted bug report at https://bugzilla.gnome.org/show_bug.cgi?id=749763

Upvotes: 0

Related Questions