Reputation: 621
I'm looking for the best regex to detect URLs in text. After trying many, I came across this article where the author demonstrated his regex to be the most robust among many. I'm trying to get this regex to work in Ruby and Javascript, but both Rubular and Regexpal are giving me errors. When I've tried to fix them, I've gotten no matches. Much love to anyone can help me translate this regex into Ruby and Javascript compatable versions.
_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS
Upvotes: 2
Views: 1091
Reputation: 621
DMKE answered my original question best, by linking me to some source I'd overlooked, so I accepted his answer. But after testing @diegoperini's regex, I was a bit underwhelmed. I ultimately stumbled upon the following regex I found on Daring Fireball:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|(([^\s()<>]+|(([^\s()<>]+)))))+(?:(([^\s()<>]+|(([^\s()<>]+))))|[^\s`!()[]{};:'".,<>?«»“”‘’]))
It is liberal, and accepts port numbers, links without http: or www., but still managed to pass my tests. Plus, it is simple and easy to read. So I would recommend this Regex for someone who wants a quick, liberal regex for URLs.
Upvotes: 0
Reputation: 4603
Have you seen the source? There are Ruby and JS ports embedded: gist.github.com/dperini/729294.
Upvotes: 1
Reputation: 98901
Ruby:
result = subject.scan(/http[s]?:\/\/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+/)
Javascript:
result = subject.match(/http[s]?:\/\/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+/g);
The “perfect URL validation regex” to work in ruby and javascript, is probably:
http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+
Upvotes: 1