Reputation: 528
I've noticed a strange preg_replace()
behaviour when I'm dealing with strings that start with a numeric character: The replacement strings have their first character (first digit) cut off. I'm seeing it in PHP 5.6.36 and PHP 7.0.30.
This code:
<?php
$items = array(
'1234567890' => '<a href="http://example.com/1234567890">1234567890</a>',
'1234567890 A' => '<a href="http://example.com/123456789-a">1234567890 A</a>',
'A 1234567890' => '<a href="http://example.com/a-1234567890">A 1234567890</a>',
'Only Text' => '<a href="http://example.com/only-text">Only Text</a>',
);
foreach( $items as $title => $item ) {
$search = '/(<a href="[^"]+">)[^<]+(<\/a>)/';
$replace = '$1' . $title . '$2';
// Preserve for re-use.
$_item = $item;
// Doesn't work -- the titles starting with a number are wonky.
$item = preg_replace( $search, $replace, $item );
echo 'Broken: ' . $item . PHP_EOL;
// Ugly hack to fix the issue.
if ( is_numeric( substr( $title, 0, 1 ) ) ) {
$title = ' ' . $title;
}
$replace = '$1' . $title . '$2';
$_item = preg_replace( $search, $replace, $_item );
echo 'Fixed: ' . $_item . PHP_EOL;
}
produces this result:
Broken: 234567890</a>
Fixed: <a href="http://example.com/1234567890"> 1234567890</a>
Broken: 234567890 A</a>
Fixed: <a href="http://example.com/123456789-a"> 1234567890 A</a>
Broken: <a href="http://example.com/a-1234567890">A 1234567890</a>
Fixed: <a href="http://example.com/a-1234567890">A 1234567890</a>
Broken: <a href="http://example.com/only-text">Only Text</a>
Fixed: <a href="http://example.com/only-text">Only Text</a>
I've tested my regex online at https://regex101.com/, and as far as I can tell, it's written correctly. (It's not terribly complex, IMHO.)
Is this a PHP bug, or am I not completely grokking my regex?
Upvotes: 0
Views: 71
Reputation: 91430
In order to avoid such behaviour, just change $1
to ${1}
, same for $2
foreach( $items as $title => $item ) {
$search = '/(<a href="[^"]+">)[^<]+(<\/a>)/';
$replace = '${1}' . $title . '${2}';
...
Upvotes: 2
Reputation: 528
It appears that my $replace
parameter ('$1' . $title . '$2'
) is to blame. Since the $title starts with a digit, it's being added to the $1, so the $replace
looks like $11234...$2
.
Solution:
$replace = '$1%s$2';
.
.
.
echo sprint( $item, $title );
...which has the advantage of not introducing spurious spaces into my page title links.
Upvotes: 0