ans4175
ans4175

Reputation: 432

How to check multiple matching words with regex in Javascript

Hey I have code like this

var text = "We are downing to earth"
var regexes = "earth|art|ear"
if (regexes.length) {
    var reg = new RegExp(regexes, "ig");
    console.log(reg)
    while ((regsult = reg.exec(text)) !== null) {
      var word = regsult[0];
      console.log(word)
    }
  }

I want to get matching words from text. It should have "earth", "art" and "ear" as well. Because "earth" consist of those substring. Instead, it only produce "earth".

Is there any mistake with my regex pattern? Or should I use another approach in JS?

Thanks

Upvotes: 1

Views: 148

Answers (2)

user663031
user663031

Reputation:

As discussed in another answer, a single regexp cannot match multiple overlapping alternatives. In your case, simply do a separate regexp test for each word you are looking for:

var text = "We are downing to earth"
var regexes = ["earth", "art", "ear"];

var results = [];
for (var i = 0; i < regexes.length; i++ ) {
  var word = regexes[i];
  if (text.match(word) results.push(word);
}

You could tighten this up a little bit by doing

regexes . filter(function(word) { return (text.match(word) || [])[0]; });

If your "regexes" are actually just strings, you could just use indexOf and keep things simpler:

regexes . filter(function(word) { return text.indexOf(word) !== -1; });

Upvotes: 2

Tim Pietzcker
Tim Pietzcker

Reputation: 336158

You only get earth as a match because the regex engine has matched earth as the first alternative and then moved on in the source string, oblivious to the fact that you could also have matched ear or art. This is expected behavior with all regex engines - they don't try to return all possible matches, just the first one, and matches generally can't overlap.

Whether earth or ear is returned depends on the regex engine. A POSIX ERE engine will always return the leftmost, longest match, whereas most current regex engines (including JavaScript's) will return the first possible match, depending on the order of alternation in the regex.

So art|earth|ear would return earth, whereas ear|art|earth would return ear.

You can make the regex find overlapping matches (as long as they start in different positions in the string) by using positive lookahead assertions:

(?=(ear|earth|art))

will find ear and art, but not earth because it starts at the same position as ear. Note that you must not look for the regex' entire match (regsult[0] in your code) in this case but for the content of the capturing group, in this case (regsult[1]).

The only way around this that I can think of at the moment would be to use

(?=(ear(th)?|art))

which would have a result like [["", "ear", "th"], ["", "art", undefined]].

Upvotes: 1

Related Questions