Reputation: 207
In case of using multiple {min,max}
quantifiers in a regexp I see that not all the combinations are used.
/[XYZ]{15,20}[WXY]{15,20}/
I tested it on a pretty random string of 11k characters but the results are not what I expected: link
I suppose that the first pair [XYZ]{15}[WXY]{15} is evaluated as TRUE and then it jumps to the next one which is [XYZ]{15}[WXY]{16}. Thus the question:
Why does perl take the first case /[XYZ]{15}[WXY]{15}/
and then moves onto /[XYZ]{16}[WXY]{15}/
instead of /[XYZ]{15}[WXY]{16}/
?
Can I control this behaviour or I need to move on and generate all combinations of such patterns and search one by one?
Thanks for any advice.
PS. This is somewhat linked to my previous post.
Upvotes: 0
Views: 2340
Reputation: 9611
Here is a visual example of how regex performs a match:
As you can see, regex performs matches left to right. This is especially crucial to take into account when using many complex alternations such as (first|second|f1rst|s2cond)
.
So, the regex engine will completely expand the left {15,20}
before it moves onto the character class that follows.
Upvotes: 0
Reputation: 93026
I think there is a misunderstanding on your side. The regex is not matched completely using at first the min of both quantifiers.
The regex engine tries at first to match the first character as often as possible (Quantifiers are greedy by default), so [XYZ]{15,20}
is matched if there are at least 15. Then it looks is there another char, ..., max till it found 20, so when it found between 15 and 20 of [XYZ]
, then it moves on to check for the rest of the pattern.
Example:
(X{15,20})(X{15,20})
and a String of 35 "X"
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
You will get the first 20 "X" in the first group, and the following 15 "X" in the second group.
Upvotes: 6