Reputation: 171
string: XXaaaXXbbbXXcccXXdddOO
I want to match the minimal string that begin with 'XX' and end with 'OO'.
So I write the non-greedy reg: r'XX.*?OO'
>>> str = 'XXaaaXXbbbXXcccXXdddOO'
>>> re.findall(r'XX.*?OO', str)
['XXaaaXXbbbXXcccXXdddOO']
I thought it will return ['XXdddOO'] but it was so 'greedy'.
Then I know I must be mistaken, because the qualifier above will firstly match the 'XX' and then show it's 'non-greedy'.
But I still want to figure out how can I get my result ['XXdddOO'] straightly. Any reply appreciated.
Till now, the key point is actually not about non-greedy , or in other words, it is about the non-greedy in my eyes: it should match as few characters as possible between the left qualifier(XX) and the right qualifier(OO). And of course the fact is that the string is processed from left to right.
Upvotes: 2
Views: 776
Reputation: 89557
The behaviour is due to the fact that the string is processed from left to right. A way to avoid the problem is to use a negated character class:
XX(?:(?=([^XO]+|O(?!O)|X(?!X)))\1)+OO
Upvotes: 1
Reputation: 398
Indeed, issue is not with greedy/non-greedy… Solution suggested by @devnull should work, provided you want to avoid even a single X between your XX
and OO
groups.
Else, you’ll have to use a lookahead (i.e. a piece of regex that will go “scooting” the string ahead, and check whether it can be fulfilled, but without actually consuming any char). Something like that:
re.findall(r'XX(?:.(?!XX))*?OO', str)
With this negative lookahead, you match (non-greedily) any char (.
) not followed by XX
…
Upvotes: 2
Reputation: 9644
Regex work from left to the right: non-greedy means that it will match XXaaaXXdddOO
and not XXaaaXXdddOOiiiOO
. If your data structure is that fixed, you could do:
XX[a-z]{3}OO
to select all patterns like XXiiiOO
(it can be adjusted to fit your your needs, with XX[^X]+?OO
for instance selecting everything in between the last XX
pair before an OO
up to that OO
: for example in XXiiiXXdddFFcccOOlll
it would match XXdddFFcccOO
)
Upvotes: 2