Paul S
Paul S

Reputation: 1239

preg_replace double replacement

I'm getting some unexpected results from a regular expression, which is meant to be replacing the classname on a namespace. The replacement appears to happen twice, so that the classname getting replaced is duplicated (see example below).

I've actually resolved the problem by changing the reg ex to match 1 or more (+) rather than 0 or more (*) which is actually more accurate for what I want.

However, I'm a little confused as to why I was getting an issue in the first place.

Here is an example of the problem:

$classns  = 'components\groups\GroupsController';
$newclass = 'GroupsAccess';
$classns = preg_replace('/[^\\\\]*$/', $newclass, $classns);
echo $classns;

Result

components\groups\GroupsAccessGroupsAccess

Expected

components\groups\GroupsAccess

Is it possible that the * is matching a word boundary or something of that nature?

The confusing part for me is that a preg_match using the same regex shows only one result, so it would appear to be something specific to how preg_match runs the regex.

e.g.

preg_match('/[^\\\\]*$/', $classns, $m);
var_dump($m);

Result

array(1) { [0]=> string(12) "GroupsAccess" }

Upvotes: 3

Views: 1817

Answers (2)

John Kugelman
John Kugelman

Reputation: 361729

Narrowing it down, this also shows two matches:

preg_match_all('/a*$/', 'a', $m);`

Python has the same behavior:

>>> re.findall('a*$', 'a')
['a', '']

So does Perl:

>>> my @m = 'a' =~ /a*$/g;
>>> foreach (@m) { print "$_\n"; }
a
<blank>

It appears that the regex engines match both 'a' and the empty string '' that follows it. Technically this is correct, although it is surprising. 'a' is a string that is anchored at the end of the search string, and so is ''.

One basic rule of matching is that matches don't overlap. Once a match has been found the regex engine continues searching for the next match at end of the previous match. What I didn't expect is that the anchor $ can be re-used, presumably since it is a zero-width assertion and not an actual substring match.

Upvotes: 2

stema
stema

Reputation: 92996

The * is not matching a word boundary, it is matching the empty string.

Your expression is at first matching

components\groups\ GroupsController

and the $ is an anchor that is matching a position and that is before the end of the string (or a \n before the end of the string).

So after the first match the position of the regex parser is after the last "r" and before the end of the string, when it tries to match your regex again. And it will find one more match ==> 0 occurrences of the / (the empty string) followed by the end of the string.

Then it moves on, recognize the end of the string and finishes.

Upvotes: 5

Related Questions