SW4
SW4

Reputation: 71170

Regular Expression to match URLs / web addresses

I have a JS function which is passed a string that a RegEx is run against, and returns any matches:

searchText= // some string which may or may not contain URLs
Rxp= new RegExp("([a-zA-Z\d]+://)?(\w+:\w+@)?([a-zA-Z\d.-]+\.[A-Za-z]{2,4})(:\d+)?(/.*)?/ig")
return searchText.match(Rxp);

The RegExp should return matches for any of the following (and similar derivations):

However, no such luck. Any suggestions?

Upvotes: 1

Views: 154

Answers (2)

Rob W
Rob W

Reputation: 349102

In a string, \ has to be escaped: \\.

First, the string is interpreted. \w turns in w, because it has no significant meaning.
Then, the parsed string is turned in a RegEx. But \ is lost during the string parsing, so your RegEx breaks.

Instead of using the RegExp constructor, use RegEx literals:

Rxp = /([a-zA-Z\d]+:\/\/)?(\w+:\w+@)?([a-zA-Z\d.-]+\.[A-Za-z]{2,4})(:\d+)?(\/.*)?/ig;
// Note: I recommend to use a different variable name. Variables starting with a
//  capital usually indicate a constructor, by convention.

If you're not 100% sure that the input is a string, it's better to use the exec method, which coerces the argument to a string:

return Rxp.exec(searchText);

Here's a pattern which includes the query string and URL fragment:

/([a-zA-Z\d]+:\/\/)?(\w+:\w+@)?([a-zA-Z\d.-]+\.[A-Za-z]{2,4})(:\d+)?(\/[^?#\s]*‌)?(\?[^#\s]*)?(#\S*)?/ig

Upvotes: 3

Mitya
Mitya

Reputation: 34576

Firstly, there's no real need to create your pattern via the RegExp constructor since it doesn't contain anything dynamic. You can just use the literal /pattern/ instead.

If you do use the constructor, though, you have to remember your pattern is declared as a string, not a literal REGEXP, so you'll need to double-escape special characters, e.g. \\d, not \d. Also, there were several forward slashes you weren't escaping at all.

With the constructor, modifiers (g, i) are passed as a second argument, not appended to the pattern.

So to literally change what you have, it would be:

Rxp= new RegExp("([a-zA-Z\\d]+:\\/\\/)?(\\w+:\\w+@)?([a-zA-Z\\d.-]+\\.[A-Za-z]{2,4})(:\\d+)?(\\/.*)?", "ig")

But better would be:

Rxp = /([a-zA-Z\d]+:\/\/)?(\w+:\w+@)?([a-zA-Z\d.-]+\.[A-Za-z]{2,4})(:\d+)?(\/.*)?/gi;

Upvotes: 1

Related Questions