Sarit Rotshild
Sarit Rotshild

Reputation: 391

url validation RegExp recognize email address as url

I have to recognize url in some text. I use the following code (this.value is the text):

if (new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?(/.*)?").test(this.value)) {
    alert("url inside");
}

The problem that is recognize also email address as url. How can I prevent it?

Upvotes: 1

Views: 2315

Answers (1)

Dmitry Sokolov
Dmitry Sokolov

Reputation: 3180

The expression /[a-zA-Z0-9_]/ is the same as /\w/i.

The original RegExp matches the "domain.org" substring in a text like "text [email protected] text mailto:[email protected] text". To fix this add (?:^|[^@\.\w-]) at the beginning of the RegExp - a substring should be at the beginning of a line or should not begin with characters '@', '.', '-', '\w'.

To exclude "mailto:user@..." substrings the expression ([a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)? should be modified. Because Javascript RegExp has no look-behind expressions the only way to exclude "mailto" is to use the look-ahead expression \w(?!ailto:)\w+:, but all substrings like "[a-zA-Z0-9_]ailto:...@..." will be excluded also.

To exclude from matches the substring "user.name" from a text like "text [email protected] text" add the expression (?=$|[^@\.\w-]) at the ending of the RegExp - match a substring only if the end of line follows the substring or the following characters '@', '.', '-', '\w' don't follow the substring.

var re = /(?:^|[^@\.\w-])([a-z0-9]+:\/\/)?(\w(?!ailto:)\w+:\w+@)?([\w.-]+\.[a-z]{2,4})(:[0-9]+)?(\/.*)?(?=$|[^@\.\w-])/im;

//if (re.test(this.value)) {
//    alert("url inside");
//}

var s1 = "text [email protected] [email protected] text mailto:[email protected] text";
if (re.test(s1)) {
    alert("Failed: text without URL");
}

var s2 = "text http://domain.org/ text";
if (!re.test(s2)) {
    alert("Failed: text with URL");
}

alert("OK");

Upvotes: 2

Related Questions