How can I programmatically identify evil regexes?

Is there an algorithm to determine whether a given JavaScript regex is vulnerable to ReDoS? The algorithm doesn't have to be perfect - some false positives and false negatives are acceptable. (I'm specifically interested in ECMA-262 regexes.)

Upvotes: 2

Answers (2)

Dima Tisnek

Reputation: 11779

TL;DR sort of, but not fully

In [9]: re.compile("(a+)+", re.DEBUG)
max_repeat 1 4294967295
  subpattern 1
    max_repeat 1 4294967295
      literal 97

Note those nested repeat 1..N, for large N, that's bad.

This takes care of all Wikipedia examples, except (a|aa)+and a*b?a*x.

Likewise it's hard to account for back-references, if your engine supports those.

IMO evil regexp is combination of two factors: combinatorial explosion and oversight in engine implementation. Thus, worst case also depends on regexp engine and sometimes flags. Backtracking is not always easy to identify.

Simple cases, however, can be identified.

Upvotes: 0

SilverlightFox

Reputation: 33578

It is hard to verify whether a regexp is evil or not without actually running it. You could try detecting some of the patterns detailed in the Wiki and generalise them:

e.g. For

(a+)+
([a-zA-Z]+)*
(a|aa)+
(a|a?)+
(.*a){x} for x > 10

You could check for )+ or )* or ){ sequences and validate against them. However, I guarantee that an attacker will find their way round them.

In essence it is a minefield to allow user set regexps. However, if you can timeout the regexp search, terminate the thread and then mark that regexp as "bad" you can mitigate the threat somewhat. In the case that the regexp is used later, maybe you could validate it by running it against an expected input at point of entry?

Later you will still need to be able to terminate it if the text evaluated at the later stage has a different effect with your regexp and mark it as bad so it will not be used again without user intervention.

Upvotes: 2

How can I programmatically identify evil regexes?

Answers (2)

Related Questions