Bolza
Bolza

Reputation: 1944

Regex: match a url but not an email domain

I have a very loose regex to match any kind of url inside a string: [a-z]+[:.].*?(?=\s|$) The only problem is that this regex will also match the domain of an email, when instead i want to exclude from the match any email address.

To be precise i do want the following match (matched string in bold)

test example.com test

test [email protected]

Any solution i tried just excludes emailstring and matches myemail.com

Here's a more complete test case https://regex101.com/r/NsxzCM/3/

Upvotes: 1

Views: 3694

Answers (3)

David Gabrielyan
David Gabrielyan

Reputation: 269

I think you need something like this:

const URL_INCLUDE_REGEX = /[(http(s)?):\/\/(www\.)?a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)/ig;
const URL_EXCLUDE_REGEX = /.*@.*/;

The second one is for excluding emails. So the final code will be:

const text = "My website is example.com";
// const text = "My email is [email protected]"; <- this will not be matched as there is email, not a url

let result = false;

text.replace(URL_INCLUDE_REGEX, (matchedText) => {
  if(!URL_EXCLUDE_REGEX.test(matchedText)) {
    result = true;
  }
});
return result;

where result will be true or false

Upvotes: 0

Harshil Patel
Harshil Patel

Reputation: 11

(:^|[^@\.\w-])([-\w:.]{1,256}\.[\w()]{1,6}\b)

helps but i don't know why it matches extra \ as well

Upvotes: 0

Andrey Tyukin
Andrey Tyukin

Reputation: 44957

Here is a two-step proposal that uses regex replace with lambdas. The first regex finds everything that looks like an ordinary URL or an email, and the second regex then filters out the strings that look like email addresses:

input = 
  "test\n" +
  "example.com\n" +
  "www.example.com\n" +
  "test sub.example.com test\n" +
  "http://example.com\n" +
  "test http://www.example.com test\n" +
  "http://sub.example.com\n" +
  "https://example.com\n" +
  "https://www.example.com\n" +
  "https://sub.example.com\n" +
  "\n" +
  "test [email protected] <- i don't want to match this\n" +
  "[email protected]    <- i don't want to match this\n" +
  "\n" +
  "git://github.com/user/project-name.git\n" +
  "irc://irc.undernet.org:6667/mIRC jhasbdjkbasd\n";

includeRegex = /(?:[\w/:@-]+\.[\w/:@.-]*)+(?=\s|$)/g ;
excludeRegex = /.*@.*/ ;

result = input.replace(includeRegex, function(s) {
  if (excludeRegex.test(s)) {
    return s; // leave as-is
  } else {
    return "(that's a non-email url: " + s +")";
  }
});

console.log(result);

Upvotes: 5

Related Questions