Reputation: 1944
I have a very loose regex to match any kind of url inside a string: [a-z]+[:.].*?(?=\s|$)
The only problem is that this regex will also match the domain of an email, when instead i want to exclude from the match any email address.
To be precise i do want the following match (matched string in bold)
test example.com test
test [email protected]
Any solution i tried just excludes emailstring
and matches myemail.com
Here's a more complete test case https://regex101.com/r/NsxzCM/3/
Upvotes: 1
Views: 3694
Reputation: 269
I think you need something like this:
const URL_INCLUDE_REGEX = /[(http(s)?):\/\/(www\.)?a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)/ig;
const URL_EXCLUDE_REGEX = /.*@.*/;
The second one is for excluding emails. So the final code will be:
const text = "My website is example.com";
// const text = "My email is [email protected]"; <- this will not be matched as there is email, not a url
let result = false;
text.replace(URL_INCLUDE_REGEX, (matchedText) => {
if(!URL_EXCLUDE_REGEX.test(matchedText)) {
result = true;
}
});
return result;
where result will be true
or false
Upvotes: 0
Reputation: 11
(:^|[^@\.\w-])([-\w:.]{1,256}\.[\w()]{1,6}\b)
helps but i don't know why it matches extra \
as well
Upvotes: 0
Reputation: 44957
Here is a two-step proposal that uses regex replace
with lambdas.
The first regex finds everything that looks like an ordinary URL or an email, and the second regex then filters out the strings that look like email addresses:
input =
"test\n" +
"example.com\n" +
"www.example.com\n" +
"test sub.example.com test\n" +
"http://example.com\n" +
"test http://www.example.com test\n" +
"http://sub.example.com\n" +
"https://example.com\n" +
"https://www.example.com\n" +
"https://sub.example.com\n" +
"\n" +
"test [email protected] <- i don't want to match this\n" +
"[email protected] <- i don't want to match this\n" +
"\n" +
"git://github.com/user/project-name.git\n" +
"irc://irc.undernet.org:6667/mIRC jhasbdjkbasd\n";
includeRegex = /(?:[\w/:@-]+\.[\w/:@.-]*)+(?=\s|$)/g ;
excludeRegex = /.*@.*/ ;
result = input.replace(includeRegex, function(s) {
if (excludeRegex.test(s)) {
return s; // leave as-is
} else {
return "(that's a non-email url: " + s +")";
}
});
console.log(result);
Upvotes: 5