Reputation: 355
I have a string in xml as
<DetailPageURL>http://www.amazon.com/exec/obidos/redirect?tag=bookmooch-20%26link_code=xm2%26camp=2025%26creative=165953%26path=http://www.amazon.com/gp/redirect.html%253fASIN=0001714600%2526tag=bookmooch-20%2526lcode=xm2%2526cID=2025%2526ccmID=165953%2526location=/o/ASIN/0001714600%25253FSubscriptionId=1AQVTEDADRW2C3ZDPCG2</DetailPageURL>
I want to apply a restriction to this that it should always starts with:
http://www.amazon.com/exec/obidos/redirect?
Here's my code:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="DetailPageURL">
<xsd:simpleType>
<xsd:restriction base="xsd:anyURI">
<xsd:pattern value="http://www.amazon.com/exec/obidos/redirect?[A-Za-z0-9]"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:schema>
Its obviously not working because i didn't add =%-_ so on special characters in it. But my question is if i add it in my pattern as:
<xsd:pattern value="http://www.amazon.com/exec/obidos/redirect?[A-Za-z0-9=-_%:/.]"/>
Is it gonna work? Is it the right way of doing it?
Upvotes: 0
Views: 1211
Reputation: 65126
Something like this:
http://www\.amazon\.com/exec/obidos/redirect\?.+
ought to be sufficient. As Ian said in the comments, having anyURI
as the base restriction already takes care of validating the general structure of the URL, so it's enough to validate for the required prefix.
And do remember to escape metacharacters such as .
and ?
.
Upvotes: 1
Reputation: 25034
If you want to ensure that any legal IRI beginning with the string you specify, and only those, are valid against your type, you will want either
(1) to import the IRI types mentioned in Michael Kay's answer, and then restrict them using the pattern suggested by Matti Virkunen (or a looser one, substituting . for \S).
or
(2) to use a pattern like
http://www\.amazon\.com/exec/obidos/redirect\?(([A-Za-z0-9\-\._~ -퟿豈-﷏ﷰ-￯𐀀-🿽𠀀-𯿽𰀀-𿿽񀀀-񏿽񐀀-񟿽񠀀-񯿽񰀀-񿿽򀀀-򏿽򐀀-򟿽򠀀-򯿽򰀀-򿿽󀀀-󏿽󐀀-󟿽󡀀-󯿽]|(%[0-9A-Fa-f][0-9A-Fa-f])|[!$&'()*+,;=:@])|[-󰀀-󿿽􀀀-􏿽/?])*
This pattern was constructed by manually expanding the definition of the entity iquery
in the types mentioned by Michael Kay. It shows that matching the actual grammar of URIs or IRIs is not really all that hard, although sometimes a bit tedious.
It is simpler and less error-prone to import the absolute-URI-3986 or absolute-IRI-3987 types and then restrict them further.
Upvotes: 0
Reputation: 163342
Michael Sperberg-McQueen has defined types that match different flavours of URI in
http://www.w3.org/2011/04/XMLSchema/TypeLibrary-URI-RFC3986.xsd
and
http://www.w3.org/2011/04/XMLSchema/TypeLibrary-IRI-RFC3987.xsd
To see the way these complex regular expressions are constructed, view these documents at the raw XML level using (for example) curl.
Upvotes: 1