Reputation: 2596
Recently I found following regex in the code. As the string it was checking was quite large it was freezing the browser. Running some experiments I measured the times using following code
var s = 'Lorem ipsum dolor sit amet, rhoncus nam sem feugiat, vel vel, viverra ultrices interdum. Volutpat ac congue. Lacinia sit donec quis facilisi, magna cubilia volutpat lectus fusce ligula quis, sit sed vivamus eget mauris quisque, aenean aenean nec litora litora massa, malesuada turpis pretium. Magnis metus nulla mauris dictum ligula, odio facilisis nullam laoreet. Aliquam tincidunt enim sit dolor mi. Duis malesuada pede, tortor consectetuer facilisis massa et leo vel. Eget fames tellus mi. Suscipit tincidunt fusce lacus convallis, ornare eu sed eu gravida interdum. Vivamus ipsum, maecenas penatibus, lacus posuere, eu cum, mauris ea libero elit. Libero blandit mattis mi sapien, iaculis wisi sit convallis, est in libero, elementum cras in a cum a vestibulum';
for (var i = 0; i < 3; i++) {
s += s;
}
start = new Date().getTime();
s.match(/AAA/i)
stop = new Date().getTime();
console.log("AAA took " + (stop - start) + " ms")
start = new Date().getTime();
s.match(/BBB/i)
stop = new Date().getTime();
console.log("BBB took " + (stop - start) + " ms")
start = new Date().getTime();
s.match(/CCC/i)
stop = new Date().getTime();
console.log("CCC took " + (stop - start) + " ms")
start = new Date().getTime();
s.match(/.*(AAA|BBB|CCC).*/i)
stop = new Date().getTime();
console.log("Combined took " + (stop - start) + " ms")
The above print
AAA took 0 ms
BBB took 1 ms
CCC took 0 ms
Combined took 53 ms
Could you explain in plain words why this regex is so slow, while checking for individual parts takes almost no time? Is there another way of writing a one line regex to check for occurrence of multiple strings which would produce the results faster?
Upvotes: 2
Views: 448
Reputation: 626952
The /AAA/i
, /BBB/i
, /CCC/i
differ from /.*(AAA|BBB|CCC).*/i
not only in the use of alternation capturing group, but also in .*
patterns.
The /.*(AAA|BBB|CCC).*/i
pattern slows down matching due to the first .*
that is a greedy dot pattern. It works in the following way:
AAA
or BBB
or CCC
, thus backtracking starts: the engine yields a char from the end of the match and tries to match one of the alternativesSee this regex debugger with visualized backtracking steps.
So, the best way to find if a line / string contains any of the 3 alternatives is just to use
/(?:AAA|BBB|CCC)/i
since this regex can find partial matches (it does not require the full string match as in Java's String#match()
).
If you need to find lines in a multiline string having one of the alternatives, it is a good idea to process line by line (say, split with \n
) and then .filter(x => /(?:AAA|BBB|CCC)/i.test(x))
.
Note (?:...)
is a non-capturing group that does not create submatches, and there is less overhead since the engine does not have to allocate additional memory for the captures.
Upvotes: 1
Reputation: 12880
You didn't include the .*
in your previous tests, resulting in your match
returning false very quickly.
Here, you can see each test takes about 30ms
to run.
The global still takes only 40ms
.
Beyond all that, using s.match(/AAA|BBB|CCC/i)
will be way faster.
var s = 'Lorem ipsum dolor sit amet, rhoncus nam sem feugiat, vel vel, viverra ultrices interdum. Volutpat ac congue. Lacinia sit donec quis facilisi, magna cubilia volutpat lectus fusce ligula quis, sit sed vivamus eget mauris quisque, aenean aenean nec litora litora massa, malesuada turpis pretium. Magnis metus nulla mauris dictum ligula, odio facilisis nullam laoreet. Aliquam tincidunt enim sit dolor mi. Duis malesuada pede, tortor consectetuer facilisis massa et leo vel. Eget fames tellus mi. Suscipit tincidunt fusce lacus convallis, ornare eu sed eu gravida interdum. Vivamus ipsum, maecenas penatibus, lacus posuere, eu cum, mauris ea libero elit. Libero blandit mattis mi sapien, iaculis wisi sit convallis, est in libero, elementum cras in a cum a vestibulum'
for (var i = 0; i < 3; i++) {
s += s;
}
start = new Date().getTime();
s.match(/.*AAA.*/i)
stop = new Date().getTime();
console.log("AAA took " + (stop - start) + " ms")
start = new Date().getTime();
s.match(/.*BBB.*/i)
stop = new Date().getTime();
console.log("BBB took " + (stop - start) + " ms")
start = new Date().getTime();
s.match(/.*CCC.*/i)
stop = new Date().getTime();
console.log("CCC took " + (stop - start) + " ms")
start = new Date().getTime();
s.match(/.*(AAA|BBB|CCC).*/i)
stop = new Date().getTime();
console.log("Combined took " + (stop - start) + " ms")
Upvotes: 0