Pat J
Pat J

Reputation: 528

preg_replace doesn't work as expected with numeric string data

I've noticed a strange preg_replace() behaviour when I'm dealing with strings that start with a numeric character: The replacement strings have their first character (first digit) cut off. I'm seeing it in PHP 5.6.36 and PHP 7.0.30.

This code:

<?php

$items = array(
    '1234567890'   => '<a href="http://example.com/1234567890">1234567890</a>',
    '1234567890 A' => '<a href="http://example.com/123456789-a">1234567890 A</a>',
    'A 1234567890' => '<a href="http://example.com/a-1234567890">A 1234567890</a>',
    'Only Text'    => '<a href="http://example.com/only-text">Only Text</a>',
);

foreach( $items as $title => $item ) {
    $search = '/(<a href="[^"]+">)[^<]+(<\/a>)/';
    $replace = '$1' . $title . '$2';

    // Preserve for re-use.
    $_item = $item;

    // Doesn't work -- the titles starting with a number are wonky.
    $item = preg_replace( $search, $replace, $item );
    echo 'Broken: ' . $item . PHP_EOL;

    // Ugly hack to fix the issue.
    if ( is_numeric( substr( $title, 0, 1 ) ) ) {
        $title = ' ' . $title;
    }
    $replace = '$1' . $title . '$2';
    $_item = preg_replace( $search, $replace, $_item );
    echo 'Fixed:  ' . $_item . PHP_EOL;
}

produces this result:

Broken: 234567890</a>
Fixed:  <a href="http://example.com/1234567890"> 1234567890</a>
Broken: 234567890 A</a>
Fixed:  <a href="http://example.com/123456789-a"> 1234567890 A</a>
Broken: <a href="http://example.com/a-1234567890">A 1234567890</a>
Fixed:  <a href="http://example.com/a-1234567890">A 1234567890</a>
Broken: <a href="http://example.com/only-text">Only Text</a>
Fixed:  <a href="http://example.com/only-text">Only Text</a>

I've tested my regex online at https://regex101.com/, and as far as I can tell, it's written correctly. (It's not terribly complex, IMHO.)

Is this a PHP bug, or am I not completely grokking my regex?

Upvotes: 0

Views: 71

Answers (2)

Toto
Toto

Reputation: 91430

In order to avoid such behaviour, just change $1 to ${1}, same for $2

foreach( $items as $title => $item ) {
    $search = '/(<a href="[^"]+">)[^<]+(<\/a>)/';
    $replace = '${1}' . $title . '${2}';
    ...

Upvotes: 2

Pat J
Pat J

Reputation: 528

It appears that my $replace parameter ('$1' . $title . '$2') is to blame. Since the $title starts with a digit, it's being added to the $1, so the $replace looks like $11234...$2.

Solution:

$replace = '$1%s$2';
.
.
.
echo sprint( $item, $title );

...which has the advantage of not introducing spurious spaces into my page title links.

Upvotes: 0

Related Questions