user9517536248
user9517536248

Reputation: 355

RegEx in xsd to check anyURI

I have a string in xml as

<DetailPageURL>http://www.amazon.com/exec/obidos/redirect?tag=bookmooch-20%26link_code=xm2%26camp=2025%26creative=165953%26path=http://www.amazon.com/gp/redirect.html%253fASIN=0001714600%2526tag=bookmooch-20%2526lcode=xm2%2526cID=2025%2526ccmID=165953%2526location=/o/ASIN/0001714600%25253FSubscriptionId=1AQVTEDADRW2C3ZDPCG2</DetailPageURL>

I want to apply a restriction to this that it should always starts with:

http://www.amazon.com/exec/obidos/redirect?

Here's my code:

<xsd:schema  xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="DetailPageURL">

          <xsd:simpleType>

            <xsd:restriction base="xsd:anyURI">

              <xsd:pattern value="http://www.amazon.com/exec/obidos/redirect?[A-Za-z0-9]"/>

            </xsd:restriction>

          </xsd:simpleType>

                </xsd:element>
</xsd:schema>

Its obviously not working because i didn't add =%-_ so on special characters in it. But my question is if i add it in my pattern as:

<xsd:pattern value="http://www.amazon.com/exec/obidos/redirect?[A-Za-z0-9=-_%:/.]"/>

Is it gonna work? Is it the right way of doing it?

Upvotes: 0

Views: 1211

Answers (3)

Matti Virkkunen
Matti Virkkunen

Reputation: 65126

Something like this:

http://www\.amazon\.com/exec/obidos/redirect\?.+

ought to be sufficient. As Ian said in the comments, having anyURI as the base restriction already takes care of validating the general structure of the URL, so it's enough to validate for the required prefix.

And do remember to escape metacharacters such as . and ?.

Upvotes: 1

C. M. Sperberg-McQueen
C. M. Sperberg-McQueen

Reputation: 25034

If you want to ensure that any legal IRI beginning with the string you specify, and only those, are valid against your type, you will want either

(1) to import the IRI types mentioned in Michael Kay's answer, and then restrict them using the pattern suggested by Matti Virkunen (or a looser one, substituting . for \S).

or

(2) to use a pattern like

http://www\.amazon\.com/exec/obidos/redirect\?(([A-Za-z0-9\-\._~&#xA0;-&#xD7FF;&#xF900;-&#xFDCF;&#xFDF0;-&#xFFEF;&#x10000;-&#x1FFFD;&#x20000;-&#x2FFFD;&#x30000;-&#x3FFFD;&#x40000;-&#x4FFFD;&#x50000;-&#x5FFFD;&#x60000;-&#x6FFFD;&#x70000;-&#x7FFFD;&#x80000;-&#x8FFFD;&#x90000;-&#x9FFFD;&#xA0000;-&#xAFFFD;&#xB0000;-&#xBFFFD;&#xC0000;-&#xCFFFD;&#xD0000;-&#xDFFFD;&#xE1000;-&#xEFFFD;]|(&#37;[0-9A-Fa-f][0-9A-Fa-f])|[!$&amp;'()*+,;=:@])|[&#xE000;-&#xF8FF;&#xF0000;-&#xFFFFD;&#x100000;-&#x10FFFD;/?])*

This pattern was constructed by manually expanding the definition of the entity iquery in the types mentioned by Michael Kay. It shows that matching the actual grammar of URIs or IRIs is not really all that hard, although sometimes a bit tedious.

It is simpler and less error-prone to import the absolute-URI-3986 or absolute-IRI-3987 types and then restrict them further.

Upvotes: 0

Michael Kay
Michael Kay

Reputation: 163342

Michael Sperberg-McQueen has defined types that match different flavours of URI in

http://www.w3.org/2011/04/XMLSchema/TypeLibrary-URI-RFC3986.xsd

and

http://www.w3.org/2011/04/XMLSchema/TypeLibrary-IRI-RFC3987.xsd

To see the way these complex regular expressions are constructed, view these documents at the raw XML level using (for example) curl.

Upvotes: 1

Related Questions