Kyle
Kyle

Reputation: 5547

How to ignore characters surrounding a URL in regex

I have the following regex

var URL_REGEX = /(^|[\s\n]|<br\/?>)((?:(?:https?|ftp):\/\/)?[\-A-Z0-9\u00A0-\uD7FF\uE000-\uFDCF\uFDF0-\uFFFD+\u0026\u2019@#\/%?=()~_|!:,.;]*[\-A-Z0-9+\u0026@#\/%=~()_|])/gi;

I am able to capture the URL in the following correctly:

var someString1 = "hello http://stackoverflow.com";
var someString2 = "hello www.stackoverflow.com";
var someString3 = "hello stackoverflow.com";
var someString4 = "hello stackoverflow.com?foo=bar&foo=baz&foo-bar=baz";

But suppose I have

var wrappedUrl = "hello (www.stackoverflow.com)";

I capture the URL along with the parentheses (I don't want that). How do I only capture the URL?

This fails to get captured. I get no match:

var wrappedUrl = "hello [www.stackoverflow.com]";

Upvotes: 2

Views: 690

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626806

You can use

/((https?|ftp)\:\/\/)?([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?([a-z0-9-.]*)\.([a-z]{2,4})(\:[0-9]{2,5})?(\/([a-z0-9+\$_-]\.?)+)*\/?(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?(#[a-z_.-][a-z0-9+\$_.-]*)?/gi

See the regex demo

Explanation:

  • ((https?|ftp)\:\/\/)? - Scheme
  • ([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)? - Username and password
  • ([a-z0-9-.]*)\.([a-z]{2,3}) - Host name or IP address
  • (\:[0-9]{2,5})? - Port address
  • (\/([a-z0-9+\$_-]\.?)+)*\/? - Path
  • (\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)? - GET query
  • (#[a-z_.-][a-z0-9+\$_.-]*)? - anchor

See the JS demo:

var re = /((https?|ftp)\:\/\/)?([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?([a-z0-9-.]*)\.([a-z]{2,4})(\:[0-9]{2,5})?(\/([a-z0-9+\$_-]\.?)+)*\/?(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?(#[a-z_.-][a-z0-9+\$_.-]*)?/gi; 
var str = `hello http://stackoverflow.com
hello www.stackoverflow.com
hello stackoverflow.com
hello stackoverflow.com?foo=bar&foo=baz&foo-bar=baz
hello [www.stackoverflow.com]
hello (www.stackoverflow.com)`;
 
while ((m = re.exec(str)) !== null) {
    document.body.innerHTML += m[0] + "<br/>";
}

Upvotes: 2

Yan Pak
Yan Pak

Reputation: 1867

I tried this regular expression /((http|https|ftp):?\/\/)?[a-z-A-Z]*(\.[a-z-A-Z]*)+(\?([a-z-A-Z0-9_]+=[a-z-A-Z0-9_]+(&)?)*)?/
And it works perfectly in all cases you have showed.
Anyway it will be good to have look into RegExp references, and try build expression from blank on your own.

Upvotes: 0

Related Questions