Simon
Simon

Reputation: 5247

Is it possible to mend a serialize string that has become corrupted due to truncation?

I have a massive multidimensional array that has been serialised by PHP. It has been stored in MySQL and the data field wasn't large enough... the end has been cut off.

I need to extract the data, but unserialize wont work.

Does anyone know of a solution that can close all the arrays and recalculate the string lengths to produce a new valid serialized string?

It's too much data to do by hand.

Upvotes: 23

Views: 22473

Answers (15)

Carlos Aragón
Carlos Aragón

Reputation: 1

Based on @Preciel's solution, fix objects too

public function unserialize(string $string2array): array {
    if (preg_match('/^(a:\d+:{)/', $string2array)) {
        preg_match_all('/((s:\d+:(?!").+(?!");)|(O:\d+:(?!").+(?!"):))/U', $string2array, $matches);
        foreach ($matches[0] as $match) {
            preg_match('/^((s|O):\d+:)/', $match, $strBase);
            $stringValue = substr($match, strlen($strBase[0]), -1);
            $endSymbol = substr($match, -1);
            $fixedValue = $strBase[2] . ':' . strlen($stringValue) . ':"' . $stringValue . '"' . $endSymbol;
            $string2array = str_replace($match, $fixedValue, $string2array);
        }
    }

    $string2array = preg_replace_callback(
        '/(a|s|b|d|i):(\d+):\"(.*?)\";/',
        function ($matches) {
            return $matches[1] . ":" . strlen($matches[3]) . ':"' . $matches[3] . '";';
        },
        $string2array
    );

    $unserializedString = (!empty($string2array) && @unserialize($string2array)) ? unserialize($string2array) : array();
    return $unserializedString;
}

Upvotes: 0

Preciel
Preciel

Reputation: 2837

Top vote answer does not fix serialized array with unquoted string value such as a:1:{i:0;s:2:14;}

function unserialize_corrupted(string $str): array {
    // Fix serialized array with unquoted strings
    if(preg_match('/^(a:\d+:{)/', $str)) {
        preg_match_all('/(s:\d+:(?!").+(?!");)/U', $str, $pm_corruptedStringValues);

        foreach($pm_corruptedStringValues[0] as $_corruptedStringValue) {
            // Get post string data
            preg_match('/^(s:\d+:)/', $_corruptedStringValue, $pm_strBase);

            // Get unquoted string
            $stringValue = substr($_corruptedStringValue, strlen($pm_strBase[0]), -1);
            // Rebuild serialized data with quoted string
            $correctedStringValue = "$pm_strBase[0]\"$stringValue\";";

            // replace corrupted data
            $str = str_replace($_corruptedStringValue, $correctedStringValue, $str);
        }
    }

    // Fix offset error
    $str = preg_replace_callback(
        '/s:(\d+):\"(.*?)\";/',
        function($matches) { return "s:".strlen($matches[2]).':"'.$matches[2].'";'; },
        $str
    );

    $unserializedString = unserialize($str);

    if($unserializedString === false) {
        // Return empty array if string can't be fixed
        $unserializedString = array();
    }

    return $unserializedString;
}

Upvotes: 0

Jonas Hünig
Jonas Hünig

Reputation: 101

we had some issues with this as well. At the end, we used a modified version of roman-newaza which also works for data containing linebreaks.

<?php 


$mysql = mysqli_connect("localhost", "...", "...", "...");
$res = mysqli_query($mysql, "SELECT option_id,option_value from ... where option_value like 'a:%'");

$prep = mysqli_prepare($mysql, "UPDATE ... set option_value = ? where option_id = ?");


function fix_str_length($matches) {
    $string = $matches[2];
    $right_length = strlen($string); // yes, strlen even for UTF-8 characters, PHP wants the mem size, not the char count
    return 's:' . $right_length . ':"' . $string . '";';
}
function fix_serialized($string) {
    if ( !preg_match('/^[aOs]:/', $string) ) return $string;
    if ( @unserialize($string) !== false ) return $string;
    $data = preg_replace('%";%', "µµµ", $string);
    $tab = explode("µµµ", $data);
    $new_data = '';
    foreach ($tab as $line) {
        $new_data .= preg_replace_callback('%\bs:(\d+):"(.*)%s', 'fix_str_length', $line);
    }
    return $new_data;
}

while ( $val = mysqli_fetch_row($res) ) {
  $y = $val[0];
  $x = $val[1];

  $unSerialized = unserialize($x);

  //In case of failure let's try to repair it
  if($unSerialized === false){
      echo "fixing $y\n";
      $repairedSerialization = fix_serialized($x);
      //$unSerialized = unserialize($repairedSerialization);
      mysqli_stmt_bind_param($prep, "si", $repairedSerialization, $y);
      mysqli_stmt_execute($prep);
  }

}

Upvotes: 0

Vsevolod Azovsky
Vsevolod Azovsky

Reputation: 115

[UPD] Colleagues, I'm not very sure if it is allowed here, but specially for similar cases I've created own tool and 've placed it on own website. Please, try it https://saysimsim.ru/tools/SerializedDataEditor

[Old text] Conclusion :-) After 3 days (instead of 2 estimated hours) migrating blessed WordPress website to a new domain name, I've finally found this page!!! Colleagues, please, consider it as my "Thank_You_Very_Much_Indeed" to all your answers. The code below consists of all your solutions with almost no additions. JFYI: personally for me the most often SOLUTION 3 works. Kamal Saleh - you are the best!!!

function hlpSuperUnSerialize($str) {
    #region Simple Security
    if (
        empty($str)
        || !is_string($str)
        || !preg_match('/^[aOs]:/', $str)
    ) {
        return FALSE;
    }
    #endregion Simple Security

    #region SOLUTION 0
    // PHP default :-)
    $repSolNum = 0;
    $strFixed  = $str;
    $arr       = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 0

    #region SOLUTION 1
    // @link https://stackoverflow.com/a/5581004/3142281
    $repSolNum = 1;
    $strFixed  = preg_replace_callback(
        '/s:([0-9]+):\"(.*?)\";/',
        function ($matches) { return "s:" . strlen($matches[2]) . ':"' . $matches[2] . '";'; },
        $str
    );
    $arr       = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 1

    #region SOLUTION 2
    // @link https://stackoverflow.com/a/24995701/3142281
    $repSolNum = 2;
    $strFixed  = preg_replace_callback(
        '/s:([0-9]+):\"(.*?)\";/',
        function ($match) {
            return "s:" . strlen($match[2]) . ':"' . $match[2] . '";';
        },
        $str);
    $arr       = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 2

    #region SOLUTION 3
    // @link https://stackoverflow.com/a/34224433/3142281
    $repSolNum = 3;
    // securities
    $strFixed = preg_replace("%\n%", "", $str);
    // doublequote exploding
    $data     = preg_replace('%";%', "µµµ", $strFixed);
    $tab      = explode("µµµ", $data);
    $new_data = '';
    foreach ($tab as $line) {
        $new_data .= preg_replace_callback(
            '%\bs:(\d+):"(.*)%',
            function ($matches) {
                $string       = $matches[2];
                $right_length = strlen($string); // yes, strlen even for UTF-8 characters, PHP wants the mem size, not the char count

                return 's:' . $right_length . ':"' . $string . '";';
            },
            $line);
    }
    $strFixed = $new_data;
    $arr      = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 3

    #region SOLUTION 4
    // @link https://stackoverflow.com/a/36454402/3142281
    $repSolNum = 4;
    $strFixed  = preg_replace_callback(
        '/s:([0-9]+):"(.*?)";/',
        function ($match) {
            return "s:" . strlen($match[2]) . ":\"" . $match[2] . "\";";
        },
        $str
    );
    $arr       = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 4

    #region SOLUTION 5
    // @link https://stackoverflow.com/a/38890855/3142281
    $repSolNum = 5;
    $strFixed  = preg_replace_callback('/s\:(\d+)\:\"(.*?)\";/s', function ($matches) { return 's:' . strlen($matches[2]) . ':"' . $matches[2] . '";'; }, $str);
    $arr       = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 5

    #region SOLUTION 6
    // @link https://stackoverflow.com/a/38891026/3142281
    $repSolNum = 6;
    $strFixed  = preg_replace_callback(
        '/s\:(\d+)\:\"(.*?)\";/s',
        function ($matches) { return 's:' . strlen($matches[2]) . ':"' . $matches[2] . '";'; },
        $str);;
    $arr = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 6
    error_log('Completely unable to deserialize.');

    return FALSE;
}

Upvotes: 0

T.Todua
T.Todua

Reputation: 56527

Solution:

1) try online:

Serialized String Fixer (online tool)

2) Use function:

unserialize( serialize_corrector($serialized_string ) ) ;

code:

function serialize_corrector($serialized_string){
    // at first, check if "fixing" is really needed at all. After that, security checkup.
    if ( @unserialize($serialized_string) !== true &&  preg_match('/^[aOs]:/', $serialized_string) ) {
        $serialized_string = preg_replace_callback( '/s\:(\d+)\:\"(.*?)\";/s',    function($matches){return 's:'.strlen($matches[2]).':"'.$matches[2].'";'; },   $serialized_string );
    }
    return $serialized_string;
} 

there is also this script, which i haven't tested.

Upvotes: 25

fabrik
fabrik

Reputation: 14365

I think this is almost impossible. Before you can repair your array you need to know how it is damaged. How many childs missing? What was the content?

Sorry imho you can't do it.

Proof:

<?php

$serialized = serialize(
    [
        'one'   => 1,
        'two'   => 'nice',
        'three' => 'will be damaged'
    ]
);

var_dump($serialized); // a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"three";s:15:"will be damaged";}

var_dump(unserialize('a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"tee";s:15:"will be damaged";}')); // please note 'tee'

var_dump(unserialize('a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"three";s:')); // serialized string is truncated

Link: https://ideone.com/uvISQu

Even if you can recalculate length of your keys/values, you cannot trust the data retrieved from this source, because you cannot recalculate the value of these. Eg. if the serialized data is an object, your properties won't be accessible anymore.

Upvotes: -5

Emil M
Emil M

Reputation: 1123

This is recalculating the length of the elements in a serialized array:

$fixed = preg_replace_callback(
    '/s:([0-9]+):\"(.*?)\";/',
    function ($matches) { return "s:".strlen($matches[2]).':"'.$matches[2].'";';     },
    $serialized
);

However, it doesn't work if your strings contain ";. In that case it's not possible to fix the serialized array string automatically -- manual editing will be needed.

Upvotes: 40

Mishu Vlad
Mishu Vlad

Reputation: 301

I have tried everything found in this post and nothing worked for me. After hours of pain here's what I found in the deep pages of google and finally worked:

function fix_str_length($matches) {
    $string = $matches[2];
    $right_length = strlen($string); // yes, strlen even for UTF-8 characters, PHP wants the mem size, not the char count
    return 's:' . $right_length . ':"' . $string . '";';
}
function fix_serialized($string) {
    // securities
    if ( !preg_match('/^[aOs]:/', $string) ) return $string;
    if ( @unserialize($string) !== false ) return $string;
    $string = preg_replace("%\n%", "", $string);
    // doublequote exploding
    $data = preg_replace('%";%', "µµµ", $string);
    $tab = explode("µµµ", $data);
    $new_data = '';
    foreach ($tab as $line) {
        $new_data .= preg_replace_callback('%\bs:(\d+):"(.*)%', 'fix_str_length', $line);
    }
    return $new_data;
}

You call the routine as follows:

//Let's consider we store the serialization inside a txt file
$corruptedSerialization = file_get_contents('corruptedSerialization.txt');

//Try to unserialize original string
$unSerialized = unserialize($corruptedSerialization);

//In case of failure let's try to repair it
if(!$unSerialized){
    $repairedSerialization = fix_serialized($corruptedSerialization);
    $unSerialized = unserialize($repairedSerialization);
}

//Keep your fingers crossed
var_dump($unSerialized);

Upvotes: 22

T.Todua
T.Todua

Reputation: 56527

Best Solution for me:

$output_array = unserialize(My_checker($serialized_string));

code:

function My_checker($serialized_string){
    // securities
    if (empty($serialized_string))                      return '';
    if ( !preg_match('/^[aOs]:/', $serialized_string) ) return $serialized_string;
    if ( @unserialize($serialized_string) !== false ) return $serialized_string;

    return
    preg_replace_callback(
        '/s\:(\d+)\:\"(.*?)\";/s', 
        function ($matches){  return 's:'.strlen($matches[2]).':"'.$matches[2].'";';  },
        $serialized_string )
    ;
}

Upvotes: 2

M Rostami
M Rostami

Reputation: 4195

Using preg_replace_callback(), instead of preg_replace(.../e) (because /e modifier is deprecated).

$fixed_serialized_String = preg_replace_callback('/s:([0-9]+):\"(.*?)\";/',function($match) {
    return "s:".strlen($match[2]).':"'.$match[2].'";';
}, $serializedString);

$correct_array= unserialize($fixed_serialized_String);

Upvotes: 3

lubosdz
lubosdz

Reputation: 4500

Following snippet will attempt to read & parse recursively damaged serialized string (blob data). For example if you stored into database column string too long and it got cut off. Numeric primitives and bool are guaranteed to be valid, strings may be cut off and/or array keys may be missing. The routine may be useful e.g. if recovering significant (not all) part of data is sufficient solution to you.

class Unserializer
{
    /**
    * Parse blob string tolerating corrupted strings & arrays
    * @param string $str Corrupted blob string
    */
    public static function parseCorruptedBlob(&$str)
    {
        // array pattern:    a:236:{...;}
        // integer pattern:  i:123;
        // double pattern:   d:329.0001122;
        // boolean pattern:  b:1; or b:0;
        // string pattern:   s:14:"date_departure";
        // null pattern:     N;
        // not supported: object O:{...}, reference R:{...}

        // NOTES:
        // - primitive types (bool, int, float) except for string are guaranteed uncorrupted
        // - arrays are tolerant to corrupted keys/values
        // - references & objects are not supported
        // - we use single byte string length calculation (strlen rather than mb_strlen) since source string is ISO-8859-2, not utf-8

        if(preg_match('/^a:(\d+):{/', $str, $match)){
            list($pattern, $cntItems) = $match;
            $str = substr($str, strlen($pattern));
            $array = [];
            for($i=0; $i<$cntItems; ++$i){
                $key = self::parseCorruptedBlob($str);
                if(trim($key)!==''){ // hmm, we wont allow null and "" as keys..
                    $array[$key] = self::parseCorruptedBlob($str);
                }
            }
            $str = ltrim($str, '}'); // closing array bracket
            return $array;
        }elseif(preg_match('/^s:(\d+):/', $str, $match)){
            list($pattern, $length) = $match;
            $str = substr($str, strlen($pattern));
            $val = substr($str, 0, $length + 2); // include also surrounding double quotes
            $str = substr($str, strlen($val) + 1); // include also semicolon
            $val = trim($val, '"'); // remove surrounding double quotes
            if(preg_match('/^a:(\d+):{/', $val)){
                // parse instantly another serialized array
                return (array) self::parseCorruptedBlob($val);
            }else{
                return (string) $val;
            }
        }elseif(preg_match('/^i:(\d+);/', $str, $match)){
            list($pattern, $val) = $match;
            $str = substr($str, strlen($pattern));
            return (int) $val;
        }elseif(preg_match('/^d:([\d.]+);/', $str, $match)){
            list($pattern, $val) = $match;
            $str = substr($str, strlen($pattern));
            return (float) $val;
        }elseif(preg_match('/^b:(0|1);/', $str, $match)){
            list($pattern, $val) = $match;
            $str = substr($str, strlen($pattern));
            return (bool) $val;
        }elseif(preg_match('/^N;/', $str, $match)){
            $str = substr($str, strlen('N;'));
            return null;
        }
    }
}

// usage:
$unserialized = Unserializer::parseCorruptedBlob($serializedString);

Upvotes: 4

Kamal Saleh
Kamal Saleh

Reputation: 499

Based on @Emil M Answer Here is a fixed version that works with text containing double quotes .

function fix_broken_serialized_array($match) {
    return "s:".strlen($match[2]).":\"".$match[2]."\";"; 
}
$fixed = preg_replace_callback(
    '/s:([0-9]+):"(.*?)";/',
    "fix_broken_serialized_array",
    $serialized
);

Upvotes: 0

Mahran Elneel
Mahran Elneel

Reputation: 196

You can return invalid serialized data back to normal, by way of an array :)

str = "a:1:{i:0;a:4:{s:4:\"name\";s:26:\"20141023_544909d85b868.rar\";s:5:\"dname\";s:20:\"HTxRcEBC0JFRWhtk.rar\";s:4:\"size\";i:19935;s:4:\"dead\";i:0;}}"; 

preg_match_all($re, $str, $matches);

if(is_array($matches) && !empty($matches[1]) && !empty($matches[2]))
{
    foreach($matches[1] as $ksel => $serv)
    {
        if(!empty($serv))
        {
            $retva[] = $serv;
        }else{
            $retva[] = $matches[2][$ksel];
        }
    }

    $count = 0;
    $arrk = array();
    $arrv = array();
    if(is_array($retva))
    {
        foreach($retva as $k => $va)
        {
            ++$count;
            if($count/2 == 1)
            {
                $arrv[] = $va;
                $count = 0;
            }else{
                $arrk[] = $va;
            }
        }
        $returnse = array_combine($arrk,$arrv);
    }

}

print_r($returnse);

Upvotes: -3

Ben
Ben

Reputation: 62454

Serializing is almost always bad because you can't search it in any way. Sorry, but it seems as though you're backed into a corner...

Upvotes: -5

Quamis
Quamis

Reputation: 11087

I doubt anyone would write code to retrieve partially saved arrays:) I fixed a thing like this once but by hand and it took hours, and then i realized i don't need that part of the array...

Unless its really important data(and i mean REALLY important) you'd be better to leave this one go

Upvotes: -3

Related Questions