Reputation: 1442
Please see my regular expression pattern code:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import re
print 'Start'
str1 = 'abcdefgasdsdfswossdfasdaef'
m = re.match(r"([A-Za-z\-\s\:\.]+)+(\d+)\w+", str1) # Want to match something like 'Moto 360x'
print m # None is expected.
print 'Done'
It takes 49 seconds to finish, any problem with the pattern?
Upvotes: 2
Views: 156
Reputation: 36028
See Runaway Regular Expressions: Catastrophic Backtracking.
In brief, if there are extremely many combinations a substring can be split into the parts of the regex, the regex matcher may end up trying them all.
Constructs like (x+)+
and x+x+
practically guarantee this behaviour.
To detect and fix the problematic constructs, the following concept can be used:
At conceptual level, the presence of a problematic construct means that your regex is ambiguous - i.e. if you disregard greedy/lazy behaviour, there's no single "correct" split of some text into the parts of the regex (or, equivalently, a subexpression thereof). So, to avoid/fix the problems, you need to see and eliminate all ambiguities.
One way to do this is to
Upvotes: 7
Reputation: 1442
Just repost the answer and solution in comments from nhahtdh and Marc B:
([A-Za-z\-\s\:\.]+)+
--> [A-Za-z\-\s\:\.]+
Thanks so much to nhahtdh and Marc B!
Upvotes: 0