r3nrut
r3nrut

Reputation: 1055

Ruby RegEx issue

I'm having a problem getting my RegEx to work with my Ruby script.

Here is what I'm trying to match:

http://my.test.website.com/{GUID}/{GUID}/

Here is the RegEx that I've tested and should be matching the string as shown above:

/([-a-zA-Z0-9@:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])*?\/)/

3 capturing groups:

group 1: ([-a-zA-Z0-9@:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])*?\/)
group 2: (\/[-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)
group 3: ([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])

Ruby is giving me an error when trying to validate a match against this regex:

empty range in char class: (My RegEx goes here) (SyntaxError)

I appreciate any thoughts or suggestions on this.

Upvotes: 0

Views: 1185

Answers (2)

mu is too short
mu is too short

Reputation: 434615

You could simplify things a bit by using URI to deal parsing the URL, \h in the regex, and scan to pull out the GUIDs:

uri   = URI.parse(your_url)
path  = uri.path
guids = path.scan(/\h{8}-\h{4}-\h{4}-\h{4}-\h{12}/)

If you need any of the non-path components of the URL the you can easily pull them out of uri.

You might need to tighten things up a bit depending on your data or it might be sufficient to check that guids has two elements.

Upvotes: 4

Amadan
Amadan

Reputation: 198324

You have several errors in your RegEx. I am very sleepy now, so I'll just give you a hint instead of a solution:

...[\/\/[0-9a-fA-F]....

the first [ does not belong there. Also, having \/\/ inside [] is unnecessary - you only need each character once inside []. Also,

...[-a-zA-Z0-9@:%_\+.~#?&\/\/=]{2,256}...

is greedy, and includes a period - indeed, includes all chars (AFAICS) that can come after it, effectively swallowing the whole string (when you get rid of other bugs). Consider {2,256}? instead.

Upvotes: 3

Related Questions