Reputation: 3448
To start off I know this is bad practice. I know there are libraries out there that are supposed to help with this; however, this is the task to which I was assigned and changing this whole thing to work with a library will be much more work than we can take on right now (since we are on a tight time frame).
In our web app we have fields that people usually type URLs into. We have been assigned a task to 'linkify' anything that looks like a URL. Currently the people who wrote our app seemed to have used a regex to determine if a string of text is a URL. I am basing my regex off that (I am no regex guru, not even a novice).
The 'search' regex looks like so
function DoesTextContainLinks(linktText) {
//replace all urls with links!
var linkifyValue = /((ftp|https?):\/\/)?(www\.)?([a-zA-Z0-9\-]{1,}\.){1,}[a-zA-Z0-9]{1,4}(:[0-9]{1,5})?(\/[a-zA-Z0-9\-\_\.\?\&\#]{1,})*(\/)?$/.test(linktText);
return linkifyValue;
}
Using this regex and https://regex101.com/ I have come up with two regexes that work most of the time.
function WrapLinkTextInAnchorTag(linkText) {
//capture links that only have www and add http to the begining of them (regex ignores entries that have http, https, and ftp in them. They are handled by the next regexes)
linkText = linkText.replace(/(^(?:(?!http).)*^(?:(?!ftp).)(www\.)?([a-zA-Z0-9\-]{1,}\.){1,}[a-zA-Z0-9]{1,4}(:[0-9]{1,5})?(\/[a-zA-Z0-9\-\_\.\?\&\#]{1,})*(\/)?$)/gim, "<a href='http://$1'>$1</a>");
//capture links that have https and http on them and fix those too. No need to prepend http here
linkText = linkText.replace(/(((https|http|ftp?):\/\/)?(www\.)?([a-zA-Z0-9\-]{1,}\.){1,}[a-zA-Z0-9]{1,4}(:[0-9]{1,5})?(\/[a-zA-Z0-9\-\_\.\?\&\#]{1,})*(\/)?$)/gim, "<a href='$1'>$1</a>");
return linkText;
}
The problem here is that some complex URLs seem to not work. I can't understand exactly why they don't work. regex101 is pretty bad ass in that it tells you what each part is doing; however, my trouble is combining these keywords in the regex to get them to do what I want. I have two scenarios to account for : when a user types www.something.com | ftp.something.com and when a user actually types http://www.something.com.
I am looking for some help in pointing out exactly what is wrong with my 2 regexes that prevents them from capturing complicated URLs like the one below
https://pw.something.com/AAPS/default.aspx?guid=a5741c35-6fe1-31a1-b555-4028e931642b
Upvotes: 0
Views: 69
Reputation: 1266
If you look closely you will notice that nowhere in your regexps do you match an =
character. That's what's breaking on the example you give.
Changing the second regexp by adding a \=
to the characters supported in the path:
linkText.replace(/(((https|http|ftp?):\/\/)?(www\.)?([a-zA-Z0-9\-]{1,}\.){1,}[a-zA-Z0-9]{1,4}(:[0-9]{1,5})?(\/[a-zA-Z0-9\-\_\.\?\&\#\=]{1,})*(\/)?$)/gim, "<a href='$1'>$1</a>");
Causes your example URL to match. That said it may be worth slogging through the RFC on urls (http://www.ietf.org/rfc/rfc3986.txt) to find other characters that might be allowed in URLs (even if they have special meanings) because you're probably missing some others.
Upvotes: 0
Reputation: 5122
I use this one ...
^(http|https|ftp)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?\/?([a-zA-Z0-9\-\._\?\,\'\/\\\+&%\$#\=~])*$
Look here ... Regex Tester
URL RegExp that requires (http, https, ftp)://, A nice domain, and a decent file/folder string. Allows : after domain name, and these characters in the file/folder string (letter, numbers, - . _ ? , ' / \ + & % $ # = ~). It blocks all other special characters and id good for protecting against user input!
Upvotes: 1