Reputation: 1017
Is there an algorithm to determine whether a given JavaScript regex is vulnerable to ReDoS? The algorithm doesn't have to be perfect - some false positives and false negatives are acceptable. (I'm specifically interested in ECMA-262 regexes.)
Upvotes: 2
Views: 672
Reputation: 11779
TL;DR sort of, but not fully
In [9]: re.compile("(a+)+", re.DEBUG)
max_repeat 1 4294967295
subpattern 1
max_repeat 1 4294967295
literal 97
Note those nested repeat 1..N, for large N, that's bad.
This takes care of all Wikipedia examples, except (a|aa)+
and a*b?a*x
.
Likewise it's hard to account for back-references, if your engine supports those.
IMO evil regexp is combination of two factors: combinatorial explosion and oversight in engine implementation. Thus, worst case also depends on regexp engine and sometimes flags. Backtracking is not always easy to identify.
Simple cases, however, can be identified.
Upvotes: 0
Reputation: 33578
It is hard to verify whether a regexp is evil or not without actually running it. You could try detecting some of the patterns detailed in the Wiki and generalise them:
e.g. For
You could check for )+
or )*
or ){
sequences and validate against them. However, I guarantee that an attacker will find their way round them.
In essence it is a minefield to allow user set regexps. However, if you can timeout the regexp search, terminate the thread and then mark that regexp as "bad" you can mitigate the threat somewhat. In the case that the regexp is used later, maybe you could validate it by running it against an expected input at point of entry?
Later you will still need to be able to terminate it if the text evaluated at the later stage has a different effect with your regexp and mark it as bad so it will not be used again without user intervention.
Upvotes: 2