Reputation: 1923
I have the use case, where I need to allow for processing an arbitrary array of strings, by an arbitrary regex, which is created either by a regex literal, or through the new RegExp()
constructor.
It all works fine, until the global g
flag is used with capturing groups.
I read a few answers on SO, and the proposed solution is to use regex.exec(string) in a while loop, e.g. How do you access the matched groups in a JavaScript regular expression?, JavaScript regular expressions and sub-matches
I also talked about it on IRC, and was advised against the implementation all together:
but there's regexes that will segfault your engine, unless you're using spidermonkey.
So here is a corner case, try pasting it into a fiddle or plunker, or even the console, it just breaks:
var regexString = '([^-]*)';
var flags = 'ig';
var regex = new RegExp(regexString, flags);
var arr = ['some-property-image.png', 'another-prop-video.png', 'y-no-work.bmp'];
var result = [];
arr.forEach(function(item) {
var match;
var inter = [];
while (match = regex.exec(item)) {
inter.push(match[0]);
}
});
console.log(result);
I did tried it on regex101.com https://regex101.com/r/xG0cL4/1
It breaks even if I do it without the quantifier, i.e. /([^-])/g
https://regex101.com/r/yT7sQ2/1
My question: what is the (correct|safe)
way to process arbitrary regexes against arbitrary strings?
Upvotes: 1
Views: 2155
Reputation: 7378
It's not working because when '-' is reached, exec() fails to match the '-' character, but does match 0 characters (because of *), so it doesn't skip it, and thus it gets stuck. If you use -|([^-]*)
, then it will skip the '-' character. You will need to then check the 'match.index' property to see if you've reach the end.
Also, you should be adding match[1]
not match[0]
if your intent is to save the matched text.
This works:
var regexString = '-|([^-]*)'; // or better yet: '([^-]+)' will work also
var flags = 'ig';
var regex = new RegExp(regexString, flags);
var arr = ['some-property-image.png', 'another-prop-video.png', 'y-no-work.bmp'];
var result = [];
arr.forEach(function(item) {
var match;
var inter = [];
while (match = regex.exec(item)) {
if (match.index >= item.length) break;
else if (match[1] !== void 0) inter.push(match[1]);
}
});
console.log(result);
but why not use 'match()' instead?
var regexString = '[^-]+';
var flags = 'gi';
var regex = new RegExp(regexString, flags);
var arr = ['some-property-image.png', 'another-prop-video.png', 'y-no-work.bmp'];
var result = [];
arr.forEach(function(item) {
var inter = item.match(regex);
});
console.log(result);
Upvotes: 1
Reputation: 10909
Why it's crashing: an infinite loop was created by using a RegExp object with a global flag and exec
in a loop with a zero-width match. The loop hits the first '-' but doesn't match that character because of the negated character class. It then reverts back to the zero-length match and so doesn't advance the index value for exec
. This means on the next loop it starts back at the same spot and does the exact same thing...infinitely.
That said, it's hard to tell what exactly you want but why not try match
? It looks like you only care about the matching string so exec
seems to be a bit of overkill.
If the desired output is a one-to-one array of results with the input array:
function foo(regexp, strings) {
return strings.reduce(function(matches, str) {
matches.push(str.match(regexp));
return matches;
}, []);
}
foo(/([^-]+)/ig, arr);
// outputs: [["some","property","image.png"],["another","prop","video.png"],["y","no","work.bmp"]]
foo(new RegExp('([^-]+)', 'ig'), arr);
// outputs: [["some","property","image.png"],["another","prop","video.png"],["y","no","work.bmp"]]
Even with the zero-width matching it won't go into an infinite loop:
foo(/([^-]*)/ig, arr));
// outputs: [["some","","property","","image.png",""],["another","","prop","","video.png",""],["y","","no","","work.bmp",""]]
If the desired output really is one array of all the matches:
function foo(regexp, strings) {
return strings.reduce(function(matches, str) {
return matches.concat(str.match(regexp));
}, []);
}
foo(/([^-]+)/ig, arr);
// outputs: ["some","property","image.png","another","prop","video.png","y","no","work.bmp"]
foo(new RegExp('([^-]+)', 'ig'), arr);
// outputs: ["some","property","image.png","another","prop","video.png","y","no","work.bmp"]
foo(/([^-]*)/ig, arr));
// outputs: ["some","","property","","image.png","","another","","prop","","video.png","","y","","no","","work.bmp",""]
Upvotes: 0
Reputation: 782407
The match
object has an index
property that contains the position of the current match. If that stays the same between two calls in the loop, it means you're stuck.
var regexString = '([^-]*)';
var flags = 'ig';
var regex = new RegExp(regexString, flags);
var arr = ['some-property-image.png', 'another-prop-video.png', 'y-no-work.bmp'];
var result = [];
arr.forEach(function(item) {
var match;
var inter = [];
var lastIndex = -1;
while (match = regex.exec(item)) {
if (match.index == lastIndex) {
break;
}
lastIndex = match.index;
inter.push(match[0]);
}
result.push(inter);
});
console.log(result);
Upvotes: 0