Souvik Ray
Souvik Ray

Reputation: 3018

How to look for a substring in a string using regex in javascript?

I have a string like below

Hello there how are you?

I want to look for the substring 'there how' in the string. So I would do something like this

var str = "Hello there how are you?"; 
var term = "there how"
var res = str.match("\\s" + term + "\\s"); //  # /s is used to ensure the match should be an independent phrase

But now the problem is, if I get a variation of the string, then the match doesn't occur. For example for strings like this

If there is a large amount of space between the words

Hello there         how are you?

If certain letters are capitialized

Hello There How are you?

What I want to do is to ensure as long as the substring 'there how' is present in the string as a separate phrase (not like Hellothere how are you? or Hello there howare you? etc), I should be able to find a match.

How can I achieve the objective?

Thanks to @Wiktor Stribiżew, he proposed this solution below

var ss = ["Hello there how are you?", "Hello there         how are you?", "Hello There How are you?"];
var term = "there how";
var rx = new RegExp("(?<!\\S)" + term.replace(/ /g, "\\s+") + "(?!\\S)", "i");
for (var i=0; i<ss.length; i++) {
    var m = ss[i].match(rx) || "";
    console.log(m[0]);
}

While this works in online nodejs compiler like repl https://repl.it/repls/AwkwardSpitefulAnimatronics, it won't work in regular javascript.

I get this error below in javascript for this line

var rx = new RegExp("(?<!\\S)" + term.replace(/ /g, "\\s+") + "(?!\\S)", "i");    

SyntaxError: invalid regexp group

How can I achieve my objective?

Upvotes: 2

Views: 108

Answers (4)

jmcgriz
jmcgriz

Reputation: 3358

Depending on how you want your results to come back, you can approach the problem one of two ways. If you want the searched term to be returned exactly the way it shows up in the input, you can make the regex more general (option 1). However, if you want the results to come back matching the formatting of the search term, you can sanitize the input first to remove excess spaces and caps.

As Tim mentioned above, the \b word break should be sufficient to determine that the phrase is independent of other words in the input.

var ss = ["Hello there how are you?", "Hello there         how are you?", "Hello There How are you?", "Hello There Howare you?"]



function buildRgx(term){
  let spaceFix = term.split(' ').join('\\s+')
  return new RegExp('\\b' + spaceFix + '\\b', 'i')
}

var generalizedSearchTerm = buildRgx("there how")

ss.forEach(str => {
  let result = generalizedSearchTerm.exec(str)
  if(result){
    strmatch = result[0],
    indexstart = result.index,
    indexend = indexstart + strmatch.length
  
    console.log(strmatch, indexstart, indexend)
  } else {
    console.log('no match found')
  }
})


//OR sanitize the input first
console.log('OR')

function sanitizeStr(str){ return str.toLowerCase().replace(/\s+/g, ' ') }

var simpleSearchTerm = new RegExp('\\b' + "there how" + '\\b')

ss.forEach(str => {
  let sanitizedString = sanitizeStr(str)
  console.log(simpleSearchTerm.exec(sanitizedString))
})

Upvotes: 1

anubhava
anubhava

Reputation: 785156

Many browsers still don't support lookbehind hence you're getting that error. You may use this approach:

var ss = ["Hello    there how are you?", "Hello there         how are you?", "Hello, There How are you?"];
var term = "there how";

var rx = new RegExp("(?:^|\\s)(" + term.replace(/ +/g, "\\s+") + ")(?!\\S)", "gi");

var m;
for (var i=0; i<ss.length; i++) {
    while(m = rx.exec(ss[i])) {
      console.log('Start:', m.index, 'End:', rx.lastIndex, m[1]);
    }
}

  • (?:^|\\s) is a non-capturing group that matches line start or a whitespace on left hand side of term.
  • Also note use of a capturing group to grab your desired substring from given input.

Upvotes: 1

James Whiteley
James Whiteley

Reputation: 3474

Below is an example using the term as part of the regex. Setup variables are from anubhava's answer.

// setup variables from other answers
var ss = ["Hello there how are you?", "Hello there         how are you?", "Hello There How are you?"];
var term = "there how";

// if you want to use the term in the regex, replace the space(s) with \\s+ (1 or more spaces)
function replaceSpaces(s) {
  return s.replace(/ /g, "\\s+")
}

// create regex
var pattern = new RegExp(`\\s${replaceSpaces(term)}\\s`)

// lowercase before comparing to ignore case
// if the regex needs to be case insensitive too, lowercase the output of replaceSpaces(term) as well
console.log(ss.map(s => pattern.test(s.toLowerCase())))

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521249

The (?<!\\S) portion of the regex string is what is causing the error, and it is happening on your regular version of JavaScript which does not support lookbehinds, even fixed width lookbehinds. One workaround would be to just use a word boundary there instead:

var rx = new RegExp("\\b" + term.replace(/ /g, "\\s+") + "(?!\\S)", "i");

Assuming your term starts and ends with word characters, \b should be sufficient to cover the behavior you want.

Upvotes: 1

Related Questions