You Old Fool
You Old Fool

Reputation: 22950

rework preg_replace with preg_replace_callback

I've seen many answers about this but as this one is a bit specific, I still need some help. I'm trying to update Blogstudio's Fix Serialization script which contains preg_replace() with \e modifier.

The code in question is this:

$data = preg_replace('!s:(\d+):([\\\\]?"[\\\\]?"|[\\\\]?"((.*?)[^\\\\])[\\\\]?");!e', "'s:'.strlen(unescape_mysql('$3')).':\"'.unescape_quotes('$3').'\";'", $data);

The confusion for me lies in:

  1. Whether those functions are intending to address escaped quotes due to the /e modifier or not?
  2. What the result should be when there is not a $3?

I had rewritten it as this but still running into warnings and other problems so the result is not the same as what's intended:

$data = preg_replace_callback(
    '!s:(\d+):([\\\\]?"[\\\\]?"|[\\\\]?"((.*?)[^\\\\])[\\\\]?");!',
    function($d) {
        $length = strlen(unescape_mysql($d[3]));
        $value = unescape_quotes($d[3]);
        $result = 's:' . $length . ':\"' . $value . '\";';
        return 's:' . $length . ':\"' . $value . '\";'
    },
    $data
);

Upvotes: 1

Views: 118

Answers (1)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89564

The problem:

s:(\d+): # group 1
(        # group 2
    [\\\\]?"[\\\\]?"
  |
    [\\\\]?"
    ((.*?)[^\\\\]) # group 3 (and 4)
    [\\\\]?"
)
;

As you can see there's an alternation with 2 branches inside the group 2. Groups 3 (and 4) are in the second branch, when the first branch succeeds these groups are not defined.

Let's clean the pattern removing useless capture groups:

s:\d+:
(?:
    [\\\\]? " [\\\\]? "
  |
    [\\\\]? "
    (.*? [^\\\\])      # group 1
    [\\\\]? "
)
;

Now the target group is the group 1, but the branch problem remains. There's two possible ways to solve it:

  • you can test if the index exists with isset in the callback function.
  • you can change the pattern in a way group 1 is defined in the two branches using the branch reset feature.

First way:

$data = preg_replace_callback(
   '~s:\K\d+:(?:[\\\\]?"[\\\\]?"|[\\\\]?"(.*?[^\\\\])[\\\\]?");~', 
   function ($m) {
     return (isset($m[1]))
       ? strlen(unescape_mysql($m[1])) . ':\"' . $m[1] . '\";'
       : '0:\"\";';
   },
   $data
);

Second way (with the branch reset feature):

$data = preg_replace_callback(
   '~s:\K\d+:(?|[\\\\]?"[\\\\]?"()|[\\\\]?"(.*?[^\\\\])[\\\\]?");~', 
   function ($m) {
     return strlen(unescape_mysql($m[1])) . ':\"' . $m[1] . '\";';
   },
   $data
);

In a branch reset group capture groups have the same numbers in each branch, to solve your problem you only need to create an empty capture group in the first branch:

(?|  # open a branch reset group
     foo
     ()  # capture group 1
  |
     bar
     (baz) # capture group 1 (too)
)

Upvotes: 2

Related Questions