Reputation: 1535
My understanding of *
is that it consumes as many characters as possible (greedily) but "gives back" when necessary. Therefore, in a*a+
, a*
would give one (or maybe more?) character back to a+
so it can match.
However, in (a{2,3})*
, why doesn't the first "instance" of a{2,3}
gives a character to the second "instance" so the second one can match?
Also, in (a{2,3})*a{2,3}
the first part does seem to give a character to the second part.
Upvotes: 4
Views: 264
Reputation: 652
A simple workaround for your question is to match aaaa
with regex ^(a{2,3})*$
.
Your problem is that:
In the case of
(a{2,3})*
, regex doesn't seem to consume as much character as possible.
I suggest not to think in giving back characters. Instead, the key is acceptance.
Once regex accept your string, the matching will be over. The pattern a{2,3}
only matches aa
or aaa
. So in the case of matching aaaa
with (a{2,3})*
, the greedy engine would match aaa
. And then, it can't match more a{2,3}
because there is only one a
remained. Though it's able for regex engine to do backtrack and match an extra a{2,3}
, it wouldn't. aaa
is now accepted by the regex, thus regex engine would not do expensive backtracking.
If you add an $
to the end of the regex, it simply tells regex engine that a partly match is unacceptable. Moreover, it's easy to explain the (a{2,3})*a{2,3}
case with accepting and backtracking.
Upvotes: 2
Reputation: 19315
backtracking is done only if match fails however aaa
is a valid match, a negative lookahead (?!a)
can be use to prevent the match be followed by a a
.
(aaa?)*
and
(aaa?)*(?!a)
Upvotes: 0
Reputation: 113866
The main problem is this:
My understanding of * is that it consumes as many characters as possible (greedily) but "gives back" when necessary
This is completely wrong. It is not what greedy means.
Greedy simply means "use the longest possible match". It does not give anything back.
Once you interpret the expressions with this new understanding everything makes sense.
a*a+
- zero or more a
followed by one or more a
(a{2,3})*a{2,3}
- zero or more of either two or three a
followed by either two or three a
(note: the KEY THING to remember is "zero or more", the first part not matching any character is considered a match)
(a{2,3})*
- zero or more of either two or three a
(this means that after matching three a
s the last single a
left cannot match)
Upvotes: 1