Reputation: 3018
I have a string like below
Hello there how are you?
I want to look for the substring 'there how'
in the string. So I would do something like this
var str = "Hello there how are you?";
var term = "there how"
var res = str.match("\\s" + term + "\\s"); // # /s is used to ensure the match should be an independent phrase
But now the problem is, if I get a variation of the string, then the match doesn't occur. For example for strings like this
If there is a large amount of space between the words
Hello there how are you?
If certain letters are capitialized
Hello There How are you?
What I want to do is to ensure as long as the substring 'there how'
is present in the string as a separate phrase (not like Hellothere how are you?
or Hello there howare you?
etc), I should be able to find a match.
How can I achieve the objective?
Thanks to @Wiktor Stribiżew, he proposed this solution below
var ss = ["Hello there how are you?", "Hello there how are you?", "Hello There How are you?"];
var term = "there how";
var rx = new RegExp("(?<!\\S)" + term.replace(/ /g, "\\s+") + "(?!\\S)", "i");
for (var i=0; i<ss.length; i++) {
var m = ss[i].match(rx) || "";
console.log(m[0]);
}
While this works in online nodejs compiler like repl https://repl.it/repls/AwkwardSpitefulAnimatronics, it won't work in regular javascript.
I get this error below in javascript for this line
var rx = new RegExp("(?<!\\S)" + term.replace(/ /g, "\\s+") + "(?!\\S)", "i");
SyntaxError: invalid regexp group
How can I achieve my objective?
Upvotes: 2
Views: 108
Reputation: 3358
Depending on how you want your results to come back, you can approach the problem one of two ways. If you want the searched term to be returned exactly the way it shows up in the input, you can make the regex more general (option 1). However, if you want the results to come back matching the formatting of the search term, you can sanitize the input first to remove excess spaces and caps.
As Tim mentioned above, the \b
word break should be sufficient to determine that the phrase is independent of other words in the input.
var ss = ["Hello there how are you?", "Hello there how are you?", "Hello There How are you?", "Hello There Howare you?"]
function buildRgx(term){
let spaceFix = term.split(' ').join('\\s+')
return new RegExp('\\b' + spaceFix + '\\b', 'i')
}
var generalizedSearchTerm = buildRgx("there how")
ss.forEach(str => {
let result = generalizedSearchTerm.exec(str)
if(result){
strmatch = result[0],
indexstart = result.index,
indexend = indexstart + strmatch.length
console.log(strmatch, indexstart, indexend)
} else {
console.log('no match found')
}
})
//OR sanitize the input first
console.log('OR')
function sanitizeStr(str){ return str.toLowerCase().replace(/\s+/g, ' ') }
var simpleSearchTerm = new RegExp('\\b' + "there how" + '\\b')
ss.forEach(str => {
let sanitizedString = sanitizeStr(str)
console.log(simpleSearchTerm.exec(sanitizedString))
})
Upvotes: 1
Reputation: 785156
Many browsers still don't support lookbehind hence you're getting that error. You may use this approach:
var ss = ["Hello there how are you?", "Hello there how are you?", "Hello, There How are you?"];
var term = "there how";
var rx = new RegExp("(?:^|\\s)(" + term.replace(/ +/g, "\\s+") + ")(?!\\S)", "gi");
var m;
for (var i=0; i<ss.length; i++) {
while(m = rx.exec(ss[i])) {
console.log('Start:', m.index, 'End:', rx.lastIndex, m[1]);
}
}
(?:^|\\s)
is a non-capturing group that matches line start or a whitespace on left hand side of term.Upvotes: 1
Reputation: 3474
Below is an example using the term as part of the regex. Setup variables are from anubhava's answer.
// setup variables from other answers
var ss = ["Hello there how are you?", "Hello there how are you?", "Hello There How are you?"];
var term = "there how";
// if you want to use the term in the regex, replace the space(s) with \\s+ (1 or more spaces)
function replaceSpaces(s) {
return s.replace(/ /g, "\\s+")
}
// create regex
var pattern = new RegExp(`\\s${replaceSpaces(term)}\\s`)
// lowercase before comparing to ignore case
// if the regex needs to be case insensitive too, lowercase the output of replaceSpaces(term) as well
console.log(ss.map(s => pattern.test(s.toLowerCase())))
Upvotes: 1
Reputation: 521249
The (?<!\\S)
portion of the regex string is what is causing the error, and it is happening on your regular version of JavaScript which does not support lookbehinds, even fixed width lookbehinds. One workaround would be to just use a word boundary there instead:
var rx = new RegExp("\\b" + term.replace(/ /g, "\\s+") + "(?!\\S)", "i");
Assuming your term
starts and ends with word characters, \b
should be sufficient to cover the behavior you want.
Upvotes: 1