Reputation: 33
I have created assertions in xsd schema 1.1 that contain regular expressions. The expressions are supposed to exclude roman numbers and numbers that have a period and space after them from beginning of value of the element. From what I have read, I don't need to anchor the regular expression in xsd schema b/c it should already apply to beginning (I may have misunderstood that). I am not able to limit the regular expressions to the beginning.
XSD:
<xs:element name="node123">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:assertion test="not(matches($value, '[\d].*\.\s.|[I].*\.\s.*|[V].*\.\s.*|[X].*\.\s.*|[L].*\.\s.*|[C].*\.\s.*'))"/>
<xs:assertion test="not(starts-with($value, '-'))"/>
<xs:assertion test="not(starts-with($value, '–'))"/>
<xs:assertion test="not(starts-with($value, '—'))"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
False positives are:
Mismash of Fid. R. Crim. Z
Shipped C. O. D
I can't use starts-with with the number expressions b/c that doesn't work at all. However, when I use starts-with with the other expressions, it doesn't apply to the whole element value.
Is there a way to limit the expressions to just the first words or start of the element?
Upvotes: 3
Views: 867
Reputation: 111591
Notes:
XSD regular expressions in xsd:pattern
facets are implicitly anchored at start (^
) and end ($
).
XPath regular expressions, which are utilized in xsd:assertion
are not implicitly anchored.
Given the above, the regex provided by @WiktorStribiżew (along with his advice to try adding ^
) is a reasonable approximation to your goal of excluding strings that look like Arabic or Romain numbers:
<xs:element name="node123">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:assertion test="not(matches($value, '^([-–—]|[0-9IVXLC]+\.\s)'))"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
Upvotes: 2
Reputation: 71
I would comment, but don't have the reputation points yet to comment. But you can force your regex to beginning by instead of doing this:
'[\d] ...
Do this:
'^\s*[\d] ..
The "^" forces the beginning, and the "\s*" handles any rogue white space.
Upvotes: 0