Reputation: 123
I'm trying to match any of the following lines with a regex in python:
RAA RAA
RAA RAA / OOO OOO
RAA RAA / OOO OOO / ROCKY
These strings should always be on their own line so RAA RAA moves over there.
wouldn't match.
I came up with this regex using RegExr:
^([A-Z]*([ ]?)*([A-Z]?)*([ \/]?)*)*$
This works fine to match all the different lines however it causes python to hang if it tries to match RAA RAA moves over there.
I've no idea why. Are there any regex experts that might have some insight?
Upvotes: 1
Views: 422
Reputation: 96487
Your entire pattern is full of optional matches, which is likely causing lots of backtracking, and thus the hanging experience. Try using a mandatory match where it makes sense, such as:
^([A-Z]+([ ]?)+([A-Z])*([ /])*)*$
A cleaner pattern, without the unnecessary capturing groups, would be:
^([A-Z]+[ ]?)+([A-Z]+[ /]*)*$
Notice that the use of +
instead of *
ensures that at least one character must match, rather than making the entire pattern optional and taxing the regex engine.
Upvotes: 0
Reputation: 61369
That regex is far too general: not only does it match more than you want, but it has so many *
s that the regex matcher will constantly be pointlessly backtracking to try some other combination. I haven't tried to work the combinatorial tree, but it's at least several thousand attempts per non-matching line.
Specific is better, and making sure you don't backtrack over what you're committed to is better:
^RAA RAA(?: \/ OOO OOO(?: \/ ROCKY)?)?$
If the substrings aren't constant, you should specify them as completely as possible to avoid unnecessary backtracking.
(The ?:
are another small optimization: don't record the parenthesized matches for later extraction. If you do need the substrings, my guess is you don't want the /
s with them, so capture just the parts you want.)
Upvotes: 2