Non-greedy mode in Python re module

Question

For some reason, I have to use the non-greedy mode of the regex in Python. Here is the code(which might look a bit weird to you):

import re
# the string
s = u""
# the pattern
p = ur"(?P(?:)+(.*?)?)(?P)(?P)"
tmp = re.search(p, s).group()

The result is the whole string s, but I want the result to be and

I think it is something to do with the non-greedy mode of regex. Could anybody point out where I am being wrong?

gog · Accepted Answer

I think this is what you're looking for:

p = ur"(?P(?:)+(((?)?)(?P)(?P)"

Explanation: the problem with your original pattern is the catch-all fragment .*? between LBRACKET and RBRACKET. Yes, it's non-greedy, but greediness only applies when the engine has a choice between two or more matches. In your pattern, there's no choice, because there's only one RBRACKET followed by . Therefore, it matches ... and doesn't look any further there because it's a valid (and shortest) match. By adding a negative lookbehind, we explicitly tell the engine that the .*? shouldn't contain RBRACKET thus forcing it to try more combinations.

Non-greedy mode in Python re module

Answers (1)

Related Questions