Reputation: 432
Hey I have code like this
var text = "We are downing to earth"
var regexes = "earth|art|ear"
if (regexes.length) {
var reg = new RegExp(regexes, "ig");
console.log(reg)
while ((regsult = reg.exec(text)) !== null) {
var word = regsult[0];
console.log(word)
}
}
I want to get matching words from text. It should have "earth", "art" and "ear" as well. Because "earth" consist of those substring. Instead, it only produce "earth".
Is there any mistake with my regex pattern? Or should I use another approach in JS?
Thanks
Upvotes: 1
Views: 148
Reputation:
As discussed in another answer, a single regexp cannot match multiple overlapping alternatives. In your case, simply do a separate regexp test for each word you are looking for:
var text = "We are downing to earth"
var regexes = ["earth", "art", "ear"];
var results = [];
for (var i = 0; i < regexes.length; i++ ) {
var word = regexes[i];
if (text.match(word) results.push(word);
}
You could tighten this up a little bit by doing
regexes . filter(function(word) { return (text.match(word) || [])[0]; });
If your "regexes" are actually just strings, you could just use indexOf
and keep things simpler:
regexes . filter(function(word) { return text.indexOf(word) !== -1; });
Upvotes: 2
Reputation: 336158
You only get earth
as a match because the regex engine has matched earth
as the first alternative and then moved on in the source string, oblivious to the fact that you could also have matched ear
or art
. This is expected behavior with all regex engines - they don't try to return all possible matches, just the first one, and matches generally can't overlap.
Whether earth
or ear
is returned depends on the regex engine. A POSIX ERE engine will always return the leftmost, longest match, whereas most current regex engines (including JavaScript's) will return the first possible match, depending on the order of alternation in the regex.
So art|earth|ear
would return earth
, whereas ear|art|earth
would return ear
.
You can make the regex find overlapping matches (as long as they start in different positions in the string) by using positive lookahead assertions:
(?=(ear|earth|art))
will find ear
and art
, but not earth
because it starts at the same position as ear
. Note that you must not look for the regex' entire match (regsult[0]
in your code) in this case but for the content of the capturing group, in this case (regsult[1]
).
The only way around this that I can think of at the moment would be to use
(?=(ear(th)?|art))
which would have a result like [["", "ear", "th"], ["", "art", undefined]]
.
Upvotes: 1