Reputation: 3757
My current regex looks for a searchQuery
inside sentences and matches them if those queries start a with a blank space, and end with either a blank space or ?!,.
. It generally works well, except for URLs. The regex ends up picking up urls and messing them up.
For example, if I was looking for "bitcoin" in a sentence "Bitcoin price is going nuts", it would find it, but it was also take the following url and match it.
https://versionone.vc/the-solar-
bitcoin-convergence
, messing up the url.
How can I tell JavaScript Regex to ignore any matches where the character before the matching words is either of these / - . _ +
? This will essentially eliminated matches inside urls?
Current Regex:
var reg = new RegExp('(\\b)${searchQuery}(\\s+|\\.|\\,|\\?|\\!', 'gi');
Replacement function:
newString = oldString.replace(reg, substringReplacement);
substringReplacement(match)
is a function that contains the logic of how to change the matching text.
Alternatively, what's another way to outright ignore urls from the searchable area. Thanks!
Upvotes: 0
Views: 1225
Reputation: 3757
Although other comments there are more right, as far as Regex is concerned, since negative look ahead isn't supported by Safari, I have for not come up with a workaround. Instead of looking ahead and trying to negate the string, I can look forward and reject matches that are most likely to be a url.
${searchQuery}(?!-|\/|\.com)
will skip a big fraction of urls, unless the searchQuery word is the last word in the url.
When I find the perfect answer, I will post it here.
Upvotes: 0
Reputation: 786091
In modern Javascript you can use dynamic length assertion in Javascript so you may try:
var reg = new RegExp('(?<!https?:\/\/\\S*)\\b${searchQuery}[\\s.,?!]', 'gi');
(?<!https?:\/\/\\S*)
is negative lookbehind that will fail a match if http://
or https://
followed by 0 or more non-whitespace characters is found before the match.
Upvotes: 2
Reputation: 371168
I'd match the format of a URL or match the searchQuery
pattern, then use a replacer function to check if the URL or the searchQuery
was matched. In the case of the URL, replace with the URL (so that nothing gets replaced in such a case).
You'll also need to use backticks for a template literal if you want to use ${}
-style interpolation.
// make this as elaborate as you want:
// https://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url
var reg = new RegExp(`(https?:\/\/\S+)|(\\b)${searchQuery}\\s+|\\.|\\,|\\?|\\!`, 'gi');
newString = oldString.replace(reg, (match, g1) => g1 ? match : substringReplacement);
You also need to make sure the ()
groups are balanced (in your current code, they aren't, so the new RegExp
call will currently throw a SyntaxError)
The substringReplacement
isn't shown, but unless you're using the groups to replace, you can probably omit the capturing groups entirely, except for the URL section.
Upvotes: 1