mjp
mjp

Reputation: 207

multiple {min,max} quantifiers in regexp

In case of using multiple {min,max} quantifiers in a regexp I see that not all the combinations are used.

/[XYZ]{15,20}[WXY]{15,20}/

I tested it on a pretty random string of 11k characters but the results are not what I expected: link

I suppose that the first pair [XYZ]{15}[WXY]{15} is evaluated as TRUE and then it jumps to the next one which is [XYZ]{15}[WXY]{16}. Thus the question:

Why does perl take the first case /[XYZ]{15}[WXY]{15}/ and then moves onto /[XYZ]{16}[WXY]{15}/ instead of /[XYZ]{15}[WXY]{16}/?

Can I control this behaviour or I need to move on and generate all combinations of such patterns and search one by one?

Thanks for any advice.

PS. This is somewhat linked to my previous post.

Upvotes: 0

Views: 2340

Answers (2)

Vasili Syrakis
Vasili Syrakis

Reputation: 9611

Here is a visual example of how regex performs a match:

As you can see, regex performs matches left to right. This is especially crucial to take into account when using many complex alternations such as (first|second|f1rst|s2cond).

So, the regex engine will completely expand the left {15,20} before it moves onto the character class that follows.

Upvotes: 0

stema
stema

Reputation: 93026

I think there is a misunderstanding on your side. The regex is not matched completely using at first the min of both quantifiers.

The regex engine tries at first to match the first character as often as possible (Quantifiers are greedy by default), so [XYZ]{15,20} is matched if there are at least 15. Then it looks is there another char, ..., max till it found 20, so when it found between 15 and 20 of [XYZ], then it moves on to check for the rest of the pattern.

Example:

(X{15,20})(X{15,20})

and a String of 35 "X"

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

You will get the first 20 "X" in the first group, and the following 15 "X" in the second group.

See it on Regexr

Upvotes: 6

Related Questions