Reputation: 1239
I'm getting some unexpected results from a regular expression, which is meant to be replacing the classname on a namespace. The replacement appears to happen twice, so that the classname getting replaced is duplicated (see example below).
I've actually resolved the problem by changing the reg ex to match 1 or more (+
) rather than 0 or more (*
) which is actually more accurate for what I want.
However, I'm a little confused as to why I was getting an issue in the first place.
Here is an example of the problem:
$classns = 'components\groups\GroupsController';
$newclass = 'GroupsAccess';
$classns = preg_replace('/[^\\\\]*$/', $newclass, $classns);
echo $classns;
Result
components\groups\GroupsAccessGroupsAccess
Expected
components\groups\GroupsAccess
Is it possible that the * is matching a word boundary or something of that nature?
The confusing part for me is that a preg_match using the same regex shows only one result, so it would appear to be something specific to how preg_match runs the regex.
e.g.
preg_match('/[^\\\\]*$/', $classns, $m);
var_dump($m);
Result
array(1) { [0]=> string(12) "GroupsAccess" }
Upvotes: 3
Views: 1817
Reputation: 361729
Narrowing it down, this also shows two matches:
preg_match_all('/a*$/', 'a', $m);`
Python has the same behavior:
>>> re.findall('a*$', 'a')
['a', '']
So does Perl:
>>> my @m = 'a' =~ /a*$/g;
>>> foreach (@m) { print "$_\n"; }
a
<blank>
It appears that the regex engines match both 'a'
and the empty string ''
that follows it. Technically this is correct, although it is surprising. 'a'
is a string that is anchored at the end of the search string, and so is ''
.
One basic rule of matching is that matches don't overlap. Once a match has been found the regex engine continues searching for the next match at end of the previous match. What I didn't expect is that the anchor $
can be re-used, presumably since it is a zero-width assertion and not an actual substring match.
Upvotes: 2
Reputation: 92996
The *
is not matching a word boundary, it is matching the empty string.
Your expression is at first matching
components\groups\ GroupsController
and the $
is an anchor that is matching a position and that is before the end of the string (or a \n
before the end of the string).
So after the first match the position of the regex parser is after the last "r" and before the end of the string, when it tries to match your regex again. And it will find one more match ==> 0 occurrences of the /
(the empty string) followed by the end of the string.
Then it moves on, recognize the end of the string and finishes.
Upvotes: 5