M.Arkk
M.Arkk

Reputation: 37

Prevent catastrophic backtracking in this regex

(\W*\d+)*(?=\W|$)

I'm having a problem with this regex, causing the system to crash when a certain term is searched.

I'm trying to find way to remove the catastrophic backtracking without changing its logic, but so far I've got nothing.

Term: 0000000000000000000000000000Abc

You can test it, and see the timeout, here:

Upvotes: 1

Views: 658

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

You need to remove + after \d since otherwise, it would take regex engine too long to test all possible ways of matching a non-matching string before admitting the no match.

Use

(?:\W*\d)*(?!\w)

or

(?:\W*\d)*\b

The \W matches any non-word char (so, no digit) and when d is matched, \W* will consume 0 or more non-word chars enabling a "linear" way of matching, when the subsequent subpattern does not match the same text as the preceding subpattern.

The (?!\w) lookahead works a bit faster than an alternation group (?:\W|$) as all it has to do it check if the next char is a word char, and if yes, the match is failed. Actually, in this situation, after \d, (?!\w) is equal to \b, a word boundary, so it is the best construct to use in this position.

See the regex demo.

Upvotes: 2

Related Questions