Reputation: 6632
Looking at this ^\s*(_?)(\S+?)\1\s*$ regular expression from injector.js
.
I have been able to understand how the string _non_
is matched. The first capturing group consists of _
, the second group consists of non
and the reference to the result of the first capture group gets you an _
. So,the first group is _
, the second group is non
and the third group is _
.
However, I have not been able to understand how the strings _
, _non
and __
are matched by the second group given the reference to the \1
in the expression which would expect an _
at the end given an _
at the beginning.
Upvotes: 3
Views: 102
Reputation: 32517
Pattern: ^\s*(_?)(\S+?)\1\s*$
Overall, this pattern:
^
start at the beginning of the string
\s*
match 0 or more whitespace chars
(_?)
match and capture 0 or 1 underscore (capture group 1)
(\S+?)
non-greedy match and capture 1 or more non-whitespace char (capture group 2)
\1
match for what was matched in capture group 1
\s*
match 0 or more whitespace chars
$
match end of line/string
Subject: _
Group 1:
Group 2: _
Initially this will be matched in the first capture group. But then the engine moves on to the 2nd capture group and it expects at least one char to match, so the engine backtracks and takes the char from the first capture group because the ?
in the first capture group makes it optional, and _
is a non-space char. Then, since ultimately nothing was matched in capture group 1 (because group 2 had to be satisfied), there is nothing to match in the \1
back-reference.
Subject: _non
Group 1:
Group 2: _non
Initially the _
is matched in group 1, then non
is matched in group 2. Then the engine looks for a _
for that \1
reference, and there is none, so the engine backtracks and matches removes it from group 1 and matches it in group 2.
Subject: _non_
Group 1: _
Group 2: non
Similar to the previous: Initially the _
is matched in group 1, then non
is matched in group 2. Then the engine looks for a _
for that \1
reference, which it matches, so group 1 keeps its _
and group 2 just has non
.
Subject: __
Group 1:
Group 2: __
This is essentially same as the first _
example. Initally the first _
is matched in group 1. Then the 2nd _
is matched in group 2. then \1
tries to match for another _
since group 1 got one, but there is none. But group 2 requires at least 1 char, but can have more, so regex engine backs up and puts group 1's match into group 2.
Subject: _ _
Group 1:
Group 2:
This results in no match. The engine starts out putting the first _
into group 1, but then fails at putting the space in group 2. So it backs up and attempts to put the first _
into group 2. Since there's no group 1, there is also no \1
to match. The space is then matched by \s*
but then the match fails on the final _
because the pattern says only spaces before the end of string.
Sidenote
You asked in a comment:
if it matches the
_
for the first group does it have to match an_
in the\1
.Does\1
it refer to the expression or the result of the expression?
It references the result of the expression (what is actually captured), not the expression itself.
Upvotes: 5