Quincy Larson
Quincy Larson

Reputation: 621

Trying to get the "perfect URL validation regex" to work in ruby and javascript

I'm looking for the best regex to detect URLs in text. After trying many, I came across this article where the author demonstrated his regex to be the most robust among many. I'm trying to get this regex to work in Ruby and Javascript, but both Rubular and Regexpal are giving me errors. When I've tried to fix them, I've gotten no matches. Much love to anyone can help me translate this regex into Ruby and Javascript compatable versions.

_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS

Upvotes: 2

Views: 1091

Answers (3)

Quincy Larson
Quincy Larson

Reputation: 621

DMKE answered my original question best, by linking me to some source I'd overlooked, so I accepted his answer. But after testing @diegoperini's regex, I was a bit underwhelmed. I ultimately stumbled upon the following regex I found on Daring Fireball:

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|(([^\s()<>]+|(([^\s()<>]+)))))+(?:(([^\s()<>]+|(([^\s()<>]+))))|[^\s`!()[]{};:'".,<>?«»“”‘’]))

It is liberal, and accepts port numbers, links without http: or www., but still managed to pass my tests. Plus, it is simple and easy to read. So I would recommend this Regex for someone who wants a quick, liberal regex for URLs.

Upvotes: 0

DMKE
DMKE

Reputation: 4603

Have you seen the source? There are Ruby and JS ports embedded: gist.github.com/dperini/729294.

Upvotes: 1

Pedro Lobito
Pedro Lobito

Reputation: 98901

Ruby:

result = subject.scan(/http[s]?:\/\/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+/)

Javascript:

result = subject.match(/http[s]?:\/\/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+/g);

The “perfect URL validation regex” to work in ruby and javascript, is probably:

http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+

Upvotes: 1

Related Questions