Stoufiler
Stoufiler

Reputation: 195

Why am I getting an error for this regular expression using the `lxml` library in Python 2?

I have a problem with a regular expression in an XSD schema. lxml says the regular expression is not valid while I'm sure it should be.

<xs:element name="birth_date">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="valeur">
        <xs:simpleType>
          <xs:union>
            <xs:simpleType>
              <xs:restriction base="xs:date"></xs:restriction>
            </xs:simpleType>
            <xs:simpleType>
              <xs:restriction base="xs:string">
                <xs:pattern
                  value="(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]))\1|(?:(?:29|30)(\/|-|\.)(?:0?[1,3-9]|1[0-2])\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.)0?2\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:(?:0?[1-9])|(?:1[0-2]))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})"></xs:pattern>
              </xs:restriction>
            </xs:simpleType>
            <xs:simpleType>
              <xs:restriction base="xs:string">
                <xs:maxLength value="0"></xs:maxLength>
              </xs:restriction>
            </xs:simpleType>
          </xs:union>
        </xs:simpleType>
      </xs:element>
      <xs:element type="xs:string" name="backgroundcolor"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

I have this error :

The value '(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]))\1|(?:(?:29|30)(\/|-|\.)(?:0?[1,3-9]|1[0-2])\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.)0?2\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:(?:0?[1-9])|(?:1[0-2]))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})' of the facet 'pattern' is not a valid regular expression.

If you can help me, I would be very grateful.

Thanks in advance.

Upvotes: 0

Views: 1816

Answers (3)

C. M. Sperberg-McQueen
C. M. Sperberg-McQueen

Reputation: 25034

In a conforming XSD schema, the value of pattern must be a string in the language defined for regular expressions by the XSD spec. In many details, that language follows the regular expressions of Perl, but there are several differences in semantics and syntax due to the way patterns are used in XSD.

  • Unlike Perl patterns, XSD patterns are regular expressions in the strict sense that they recognize regular languages. They do not include bells and whistles like parenthesis matching or substring repetition, which require more than regular-expression power.

  • Like all facets in the definition of a simple type, patterns are used to produce a Boolean result: either the candidate literal matches the regular expression or it does not. There is no facility in XSD for using parts of the literal string, and so there is no motivation for the convention that the regex engine retains a record of the part of the string matched by a particular regular expression so that that substring can be retrieved later by a reference of the form \1, \2, etc.

    Since \ is a metacharacter in XSD regular expressions, and \1 etc. are not defined as escape sequences, the string \1 is not a legal XSD regular expression, nor is any string containing it. (Ditto for \2 through \9.)

  • Since XSD regexes have no facilities for instructing the engine to remember substrings, there is no utility in distinguish parenthesized expressions whose matching string should be remembers from those whose matching string need not be remembered (capturing and non-capturing groups).

    Since ? is a meta-character which must be preceded by a regular expression, and since ( is not a legal regular expression, nor the last character of any legal regular expression unless escaped (the string '\(' is a legal regular expression) the string (? is not a legal XSD regular expression, nor can it appear as a substring of any legal XSD regular expression unless the left parenthesis is preceded by backslash.

    XPath regular expressions do have instructions for capturing and non-capturing groups and they extend the XSD regular expression syntax accordingly.

  • Since patterns must be matched by the entire candidate literal, there is no need to distinguish anchored from non-anchored regular expressions.

    The characters '^' and '$' are legal in XSD regular expressions, but they are not meta-characters and they do not match the beginning and ending of lines or strings: they match the characters '^' and '$'.

There may be other issues with your regex; these are the ones that come to mind first.

Upvotes: 1

Stoufiler
Stoufiler

Reputation: 195

I put this pattern and it works thanks guys

<xs:pattern value="[0-3][0-9]-[01][0-9]-[0-9]{4}"></xs:pattern>

Upvotes: 0

kjhughes
kjhughes

Reputation: 111521

XSD regular expressions do not support non-capturing groups (?:) and backreferences (\1, \2, etc). If you remove those constructs, your syntax errors will be eliminated.

Upvotes: 1

Related Questions