Halle
Halle

Reputation: 33

Regex in xsd assertion limited to beginning of element value

I have created assertions in xsd schema 1.1 that contain regular expressions. The expressions are supposed to exclude roman numbers and numbers that have a period and space after them from beginning of value of the element. From what I have read, I don't need to anchor the regular expression in xsd schema b/c it should already apply to beginning (I may have misunderstood that). I am not able to limit the regular expressions to the beginning.

XSD:

   <xs:element name="node123">
         <xs:simpleType>
                <xs:restriction base="xs:string">
                       <xs:assertion test="not(matches($value, '[\d].*\.\s.|[I].*\.\s.*|[V].*\.\s.*|[X].*\.\s.*|[L].*\.\s.*|[C].*\.\s.*'))"/>
                       <xs:assertion test="not(starts-with($value, '-'))"/>
                       <xs:assertion test="not(starts-with($value, '–'))"/>
                       <xs:assertion test="not(starts-with($value, '—'))"/>
                </xs:restriction>
         </xs:simpleType>
   </xs:element>

False positives are:

Mismash of Fid. R. Crim. Z

Shipped C. O. D

I can't use starts-with with the number expressions b/c that doesn't work at all. However, when I use starts-with with the other expressions, it doesn't apply to the whole element value.

Is there a way to limit the expressions to just the first words or start of the element?

Upvotes: 3

Views: 867

Answers (2)

kjhughes
kjhughes

Reputation: 111591

Notes:

  1. XSD regular expressions in xsd:pattern facets are implicitly anchored at start (^) and end ($).

  2. XPath regular expressions, which are utilized in xsd:assertion are not implicitly anchored.

Given the above, the regex provided by @WiktorStribiżew (along with his advice to try adding ^) is a reasonable approximation to your goal of excluding strings that look like Arabic or Romain numbers:

  <xs:element name="node123">
    <xs:simpleType>
      <xs:restriction base="xs:string">
        <xs:assertion test="not(matches($value, '^([-–—]|[0-9IVXLC]+\.\s)'))"/>
      </xs:restriction>
    </xs:simpleType>
  </xs:element>

Upvotes: 2

NybbleStar
NybbleStar

Reputation: 71

I would comment, but don't have the reputation points yet to comment. But you can force your regex to beginning by instead of doing this:

'[\d] ...

Do this:

'^\s*[\d] ..

The "^" forces the beginning, and the "\s*" handles any rogue white space.

Upvotes: 0

Related Questions