Reputation: 7135
I've just recently started learning Regex so i'm not sure yet about a couple of aspects of the hole thing.
Right now my web page reads in the URL breaks it up into parts and only uses certain parts for processing: E.g. 1) http://mycontoso.com/products/luggage/selloBag E.g. 2) http://mycontoso.com/products/luggage/selloBag.sf404.aspx
For some reason Sitefinity is giving us both possibilities, which is fine, but what I need from this is only the actual product details as in "luggage/selloBag"
My current Regex expression is: "(.*)(map-search)(\/)(.*)(\.sf404\.aspx)"
, I combine this with a replace statement and extract the contents of group 4 (or $4), which is fine, but it doesn't work for example 2.
So the question is: Is it possible to match 2 possibilities with regular expressions where a part of a string might or might not be there and then still reference a group whose value you actually want to use?
Upvotes: 0
Views: 1492
Reputation: 39055
You don't say if you're doing this in javascript, but if you are, the parseUri
lib written by Steven Levithan does a pretty damn good job at parsing urls. You can get it from various places, including here (click on the "Source Code" tab) and here.
Upvotes: 0
Reputation: 34395
RFC-3986 is the authority regarding URIs. Appendix B provides this regex to break one down into its components:
re_3986 = r"^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?"
# Where:
# scheme = $2
# authority = $4
# path = $5
# query = $7
# fragment = $9
Here is an enhanced (and commented) regex (in Python syntax) which utilizes named capture groups:
re_3986_enhanced = re.compile(r"""
# Parse and capture RFC-3986 Generic URI components.
^ # anchor to beginning of string
(?: (?P<scheme> [^:/?#\s]+): )? # capture optional scheme
(?://(?P<authority> [^/?#\s]*) )? # capture optional authority
(?P<path> [^?#\s]*) # capture required path
(?:\?(?P<query> [^#\s]*) )? # capture optional query
(?:\#(?P<fragment> [^\s]*) )? # capture optional fragment
$ # anchor to end of string
""", re.MULTILINE | re.VERBOSE)
For more information regarding the picking apart and validation of a URI according to RFC-3986, you may want to take a look at an article I've been working on: Regular Expression URI Validation
Upvotes: 5
Reputation: 363627
Depends on your regex implementation, but most support a syntax like
(\.sf404\.aspx|)
Assuming that's your group 4 (i.e. zero-indexed groups). The |
lists two alternatives, one of which is the empty string.
Upvotes: 0