Reputation: 8424
This regex:
(a)?b\1c
does not match "bc" while this one:
(a?)b\1c
does match it. Why is this? I thought these statements are identical.
Upvotes: 6
Views: 177
Reputation: 68790
In your first example (a)?b\1c
, \1
refers to your (a)
group, it means you must have an a
:
abac
will matchbac
will matchbc
won't matchIn your second example (a?)b\1c
, \1
refers to (a?)
, where a
is optional :
abac
will matchbac
won't matchbc
will matchThe back reference doesn't care of your external ?
(in the first example), it only takes care of what is inside parenthesis.
Upvotes: 6
Reputation: 19423
It's a bit confusing, but let's see, I will start with the second regular expression:
(a?)b\1c
When this tries to match bc
it first tries (a?)
but since there is no a
in bc
, ()
will capture the empty string ""
so when we later refer to it in the string using \1
, \1
will match the empty string which is always possible.
Now let's go to the second case:
(a)?b\1c
(a)
will try to match a
but fails, but since the entire group (a)?
is optional, the regular expression continues, now it tries to find a b
OK, then \1
but (a)?
didn't match anything, even the empty string so the match fails.
So the difference between the two regex is that in (a?)
the capturing group captures an empty string which can be referenced later and matched successfully using \1
, but (a)?
creates an optional capturing group that didn't match anything so referencing it later using \1
will always fails unless the group actually matched an a
.
Upvotes: 3
Reputation: 3838
In the firs version, parentheses catch a
so \1
returns a
.
In the second regex, parentheses catch a?
so \1
returns a?
which means "0 or 1 a
".
As a
is optional in the second regex, bc
match so well the end of the second regex (b\1c
)
Upvotes: 2