Reputation: 5247
I have a massive multidimensional array that has been serialised by PHP. It has been stored in MySQL and the data field wasn't large enough... the end has been cut off.
I need to extract the data, but unserialize
wont work.
Does anyone know of a solution that can close all the arrays and recalculate the string lengths to produce a new valid serialized string?
It's too much data to do by hand.
Upvotes: 23
Views: 22473
Reputation: 1
Based on @Preciel's solution, fix objects too
public function unserialize(string $string2array): array {
if (preg_match('/^(a:\d+:{)/', $string2array)) {
preg_match_all('/((s:\d+:(?!").+(?!");)|(O:\d+:(?!").+(?!"):))/U', $string2array, $matches);
foreach ($matches[0] as $match) {
preg_match('/^((s|O):\d+:)/', $match, $strBase);
$stringValue = substr($match, strlen($strBase[0]), -1);
$endSymbol = substr($match, -1);
$fixedValue = $strBase[2] . ':' . strlen($stringValue) . ':"' . $stringValue . '"' . $endSymbol;
$string2array = str_replace($match, $fixedValue, $string2array);
}
}
$string2array = preg_replace_callback(
'/(a|s|b|d|i):(\d+):\"(.*?)\";/',
function ($matches) {
return $matches[1] . ":" . strlen($matches[3]) . ':"' . $matches[3] . '";';
},
$string2array
);
$unserializedString = (!empty($string2array) && @unserialize($string2array)) ? unserialize($string2array) : array();
return $unserializedString;
}
Upvotes: 0
Reputation: 2837
Top vote answer does not fix serialized array with unquoted string value such as a:1:{i:0;s:2:14;}
function unserialize_corrupted(string $str): array {
// Fix serialized array with unquoted strings
if(preg_match('/^(a:\d+:{)/', $str)) {
preg_match_all('/(s:\d+:(?!").+(?!");)/U', $str, $pm_corruptedStringValues);
foreach($pm_corruptedStringValues[0] as $_corruptedStringValue) {
// Get post string data
preg_match('/^(s:\d+:)/', $_corruptedStringValue, $pm_strBase);
// Get unquoted string
$stringValue = substr($_corruptedStringValue, strlen($pm_strBase[0]), -1);
// Rebuild serialized data with quoted string
$correctedStringValue = "$pm_strBase[0]\"$stringValue\";";
// replace corrupted data
$str = str_replace($_corruptedStringValue, $correctedStringValue, $str);
}
}
// Fix offset error
$str = preg_replace_callback(
'/s:(\d+):\"(.*?)\";/',
function($matches) { return "s:".strlen($matches[2]).':"'.$matches[2].'";'; },
$str
);
$unserializedString = unserialize($str);
if($unserializedString === false) {
// Return empty array if string can't be fixed
$unserializedString = array();
}
return $unserializedString;
}
Upvotes: 0
Reputation: 101
we had some issues with this as well. At the end, we used a modified version of roman-newaza which also works for data containing linebreaks.
<?php
$mysql = mysqli_connect("localhost", "...", "...", "...");
$res = mysqli_query($mysql, "SELECT option_id,option_value from ... where option_value like 'a:%'");
$prep = mysqli_prepare($mysql, "UPDATE ... set option_value = ? where option_id = ?");
function fix_str_length($matches) {
$string = $matches[2];
$right_length = strlen($string); // yes, strlen even for UTF-8 characters, PHP wants the mem size, not the char count
return 's:' . $right_length . ':"' . $string . '";';
}
function fix_serialized($string) {
if ( !preg_match('/^[aOs]:/', $string) ) return $string;
if ( @unserialize($string) !== false ) return $string;
$data = preg_replace('%";%', "µµµ", $string);
$tab = explode("µµµ", $data);
$new_data = '';
foreach ($tab as $line) {
$new_data .= preg_replace_callback('%\bs:(\d+):"(.*)%s', 'fix_str_length', $line);
}
return $new_data;
}
while ( $val = mysqli_fetch_row($res) ) {
$y = $val[0];
$x = $val[1];
$unSerialized = unserialize($x);
//In case of failure let's try to repair it
if($unSerialized === false){
echo "fixing $y\n";
$repairedSerialization = fix_serialized($x);
//$unSerialized = unserialize($repairedSerialization);
mysqli_stmt_bind_param($prep, "si", $repairedSerialization, $y);
mysqli_stmt_execute($prep);
}
}
Upvotes: 0
Reputation: 115
[UPD] Colleagues, I'm not very sure if it is allowed here, but specially for similar cases I've created own tool and 've placed it on own website. Please, try it https://saysimsim.ru/tools/SerializedDataEditor
[Old text] Conclusion :-) After 3 days (instead of 2 estimated hours) migrating blessed WordPress website to a new domain name, I've finally found this page!!! Colleagues, please, consider it as my "Thank_You_Very_Much_Indeed" to all your answers. The code below consists of all your solutions with almost no additions. JFYI: personally for me the most often SOLUTION 3 works. Kamal Saleh - you are the best!!!
function hlpSuperUnSerialize($str) {
#region Simple Security
if (
empty($str)
|| !is_string($str)
|| !preg_match('/^[aOs]:/', $str)
) {
return FALSE;
}
#endregion Simple Security
#region SOLUTION 0
// PHP default :-)
$repSolNum = 0;
$strFixed = $str;
$arr = @unserialize($strFixed);
if (FALSE !== $arr) {
error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");
return $arr;
}
#endregion SOLUTION 0
#region SOLUTION 1
// @link https://stackoverflow.com/a/5581004/3142281
$repSolNum = 1;
$strFixed = preg_replace_callback(
'/s:([0-9]+):\"(.*?)\";/',
function ($matches) { return "s:" . strlen($matches[2]) . ':"' . $matches[2] . '";'; },
$str
);
$arr = @unserialize($strFixed);
if (FALSE !== $arr) {
error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");
return $arr;
}
#endregion SOLUTION 1
#region SOLUTION 2
// @link https://stackoverflow.com/a/24995701/3142281
$repSolNum = 2;
$strFixed = preg_replace_callback(
'/s:([0-9]+):\"(.*?)\";/',
function ($match) {
return "s:" . strlen($match[2]) . ':"' . $match[2] . '";';
},
$str);
$arr = @unserialize($strFixed);
if (FALSE !== $arr) {
error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");
return $arr;
}
#endregion SOLUTION 2
#region SOLUTION 3
// @link https://stackoverflow.com/a/34224433/3142281
$repSolNum = 3;
// securities
$strFixed = preg_replace("%\n%", "", $str);
// doublequote exploding
$data = preg_replace('%";%', "µµµ", $strFixed);
$tab = explode("µµµ", $data);
$new_data = '';
foreach ($tab as $line) {
$new_data .= preg_replace_callback(
'%\bs:(\d+):"(.*)%',
function ($matches) {
$string = $matches[2];
$right_length = strlen($string); // yes, strlen even for UTF-8 characters, PHP wants the mem size, not the char count
return 's:' . $right_length . ':"' . $string . '";';
},
$line);
}
$strFixed = $new_data;
$arr = @unserialize($strFixed);
if (FALSE !== $arr) {
error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");
return $arr;
}
#endregion SOLUTION 3
#region SOLUTION 4
// @link https://stackoverflow.com/a/36454402/3142281
$repSolNum = 4;
$strFixed = preg_replace_callback(
'/s:([0-9]+):"(.*?)";/',
function ($match) {
return "s:" . strlen($match[2]) . ":\"" . $match[2] . "\";";
},
$str
);
$arr = @unserialize($strFixed);
if (FALSE !== $arr) {
error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");
return $arr;
}
#endregion SOLUTION 4
#region SOLUTION 5
// @link https://stackoverflow.com/a/38890855/3142281
$repSolNum = 5;
$strFixed = preg_replace_callback('/s\:(\d+)\:\"(.*?)\";/s', function ($matches) { return 's:' . strlen($matches[2]) . ':"' . $matches[2] . '";'; }, $str);
$arr = @unserialize($strFixed);
if (FALSE !== $arr) {
error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");
return $arr;
}
#endregion SOLUTION 5
#region SOLUTION 6
// @link https://stackoverflow.com/a/38891026/3142281
$repSolNum = 6;
$strFixed = preg_replace_callback(
'/s\:(\d+)\:\"(.*?)\";/s',
function ($matches) { return 's:' . strlen($matches[2]) . ':"' . $matches[2] . '";'; },
$str);;
$arr = @unserialize($strFixed);
if (FALSE !== $arr) {
error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");
return $arr;
}
#endregion SOLUTION 6
error_log('Completely unable to deserialize.');
return FALSE;
}
Upvotes: 0
Reputation: 56527
1) try online:
Serialized String Fixer (online tool)
2) Use function:
unserialize(
serialize_corrector(
$serialized_string ) ) ;
code:
function serialize_corrector($serialized_string){
// at first, check if "fixing" is really needed at all. After that, security checkup.
if ( @unserialize($serialized_string) !== true && preg_match('/^[aOs]:/', $serialized_string) ) {
$serialized_string = preg_replace_callback( '/s\:(\d+)\:\"(.*?)\";/s', function($matches){return 's:'.strlen($matches[2]).':"'.$matches[2].'";'; }, $serialized_string );
}
return $serialized_string;
}
there is also this script, which i haven't tested.
Upvotes: 25
Reputation: 14365
I think this is almost impossible. Before you can repair your array you need to know how it is damaged. How many childs missing? What was the content?
Sorry imho you can't do it.
Proof:
<?php
$serialized = serialize(
[
'one' => 1,
'two' => 'nice',
'three' => 'will be damaged'
]
);
var_dump($serialized); // a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"three";s:15:"will be damaged";}
var_dump(unserialize('a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"tee";s:15:"will be damaged";}')); // please note 'tee'
var_dump(unserialize('a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"three";s:')); // serialized string is truncated
Link: https://ideone.com/uvISQu
Even if you can recalculate length of your keys/values, you cannot trust the data retrieved from this source, because you cannot recalculate the value of these. Eg. if the serialized data is an object, your properties won't be accessible anymore.
Upvotes: -5
Reputation: 1123
This is recalculating the length of the elements in a serialized array:
$fixed = preg_replace_callback(
'/s:([0-9]+):\"(.*?)\";/',
function ($matches) { return "s:".strlen($matches[2]).':"'.$matches[2].'";'; },
$serialized
);
However, it doesn't work if your strings contain ";
. In that case it's not possible to fix the serialized array string automatically -- manual editing will be needed.
Upvotes: 40
Reputation: 301
I have tried everything found in this post and nothing worked for me. After hours of pain here's what I found in the deep pages of google and finally worked:
function fix_str_length($matches) {
$string = $matches[2];
$right_length = strlen($string); // yes, strlen even for UTF-8 characters, PHP wants the mem size, not the char count
return 's:' . $right_length . ':"' . $string . '";';
}
function fix_serialized($string) {
// securities
if ( !preg_match('/^[aOs]:/', $string) ) return $string;
if ( @unserialize($string) !== false ) return $string;
$string = preg_replace("%\n%", "", $string);
// doublequote exploding
$data = preg_replace('%";%', "µµµ", $string);
$tab = explode("µµµ", $data);
$new_data = '';
foreach ($tab as $line) {
$new_data .= preg_replace_callback('%\bs:(\d+):"(.*)%', 'fix_str_length', $line);
}
return $new_data;
}
You call the routine as follows:
//Let's consider we store the serialization inside a txt file
$corruptedSerialization = file_get_contents('corruptedSerialization.txt');
//Try to unserialize original string
$unSerialized = unserialize($corruptedSerialization);
//In case of failure let's try to repair it
if(!$unSerialized){
$repairedSerialization = fix_serialized($corruptedSerialization);
$unSerialized = unserialize($repairedSerialization);
}
//Keep your fingers crossed
var_dump($unSerialized);
Upvotes: 22
Reputation: 56527
$output_array = unserialize(My_checker($serialized_string));
code:
function My_checker($serialized_string){
// securities
if (empty($serialized_string)) return '';
if ( !preg_match('/^[aOs]:/', $serialized_string) ) return $serialized_string;
if ( @unserialize($serialized_string) !== false ) return $serialized_string;
return
preg_replace_callback(
'/s\:(\d+)\:\"(.*?)\";/s',
function ($matches){ return 's:'.strlen($matches[2]).':"'.$matches[2].'";'; },
$serialized_string )
;
}
Upvotes: 2
Reputation: 4195
Using preg_replace_callback()
, instead of preg_replace(.../e)
(because /e
modifier is deprecated).
$fixed_serialized_String = preg_replace_callback('/s:([0-9]+):\"(.*?)\";/',function($match) {
return "s:".strlen($match[2]).':"'.$match[2].'";';
}, $serializedString);
$correct_array= unserialize($fixed_serialized_String);
Upvotes: 3
Reputation: 4500
Following snippet will attempt to read & parse recursively damaged serialized string (blob data). For example if you stored into database column string too long and it got cut off. Numeric primitives and bool are guaranteed to be valid, strings may be cut off and/or array keys may be missing. The routine may be useful e.g. if recovering significant (not all) part of data is sufficient solution to you.
class Unserializer
{
/**
* Parse blob string tolerating corrupted strings & arrays
* @param string $str Corrupted blob string
*/
public static function parseCorruptedBlob(&$str)
{
// array pattern: a:236:{...;}
// integer pattern: i:123;
// double pattern: d:329.0001122;
// boolean pattern: b:1; or b:0;
// string pattern: s:14:"date_departure";
// null pattern: N;
// not supported: object O:{...}, reference R:{...}
// NOTES:
// - primitive types (bool, int, float) except for string are guaranteed uncorrupted
// - arrays are tolerant to corrupted keys/values
// - references & objects are not supported
// - we use single byte string length calculation (strlen rather than mb_strlen) since source string is ISO-8859-2, not utf-8
if(preg_match('/^a:(\d+):{/', $str, $match)){
list($pattern, $cntItems) = $match;
$str = substr($str, strlen($pattern));
$array = [];
for($i=0; $i<$cntItems; ++$i){
$key = self::parseCorruptedBlob($str);
if(trim($key)!==''){ // hmm, we wont allow null and "" as keys..
$array[$key] = self::parseCorruptedBlob($str);
}
}
$str = ltrim($str, '}'); // closing array bracket
return $array;
}elseif(preg_match('/^s:(\d+):/', $str, $match)){
list($pattern, $length) = $match;
$str = substr($str, strlen($pattern));
$val = substr($str, 0, $length + 2); // include also surrounding double quotes
$str = substr($str, strlen($val) + 1); // include also semicolon
$val = trim($val, '"'); // remove surrounding double quotes
if(preg_match('/^a:(\d+):{/', $val)){
// parse instantly another serialized array
return (array) self::parseCorruptedBlob($val);
}else{
return (string) $val;
}
}elseif(preg_match('/^i:(\d+);/', $str, $match)){
list($pattern, $val) = $match;
$str = substr($str, strlen($pattern));
return (int) $val;
}elseif(preg_match('/^d:([\d.]+);/', $str, $match)){
list($pattern, $val) = $match;
$str = substr($str, strlen($pattern));
return (float) $val;
}elseif(preg_match('/^b:(0|1);/', $str, $match)){
list($pattern, $val) = $match;
$str = substr($str, strlen($pattern));
return (bool) $val;
}elseif(preg_match('/^N;/', $str, $match)){
$str = substr($str, strlen('N;'));
return null;
}
}
}
// usage:
$unserialized = Unserializer::parseCorruptedBlob($serializedString);
Upvotes: 4
Reputation: 499
Based on @Emil M Answer Here is a fixed version that works with text containing double quotes .
function fix_broken_serialized_array($match) {
return "s:".strlen($match[2]).":\"".$match[2]."\";";
}
$fixed = preg_replace_callback(
'/s:([0-9]+):"(.*?)";/',
"fix_broken_serialized_array",
$serialized
);
Upvotes: 0
Reputation: 196
You can return invalid serialized data back to normal, by way of an array :)
str = "a:1:{i:0;a:4:{s:4:\"name\";s:26:\"20141023_544909d85b868.rar\";s:5:\"dname\";s:20:\"HTxRcEBC0JFRWhtk.rar\";s:4:\"size\";i:19935;s:4:\"dead\";i:0;}}";
preg_match_all($re, $str, $matches);
if(is_array($matches) && !empty($matches[1]) && !empty($matches[2]))
{
foreach($matches[1] as $ksel => $serv)
{
if(!empty($serv))
{
$retva[] = $serv;
}else{
$retva[] = $matches[2][$ksel];
}
}
$count = 0;
$arrk = array();
$arrv = array();
if(is_array($retva))
{
foreach($retva as $k => $va)
{
++$count;
if($count/2 == 1)
{
$arrv[] = $va;
$count = 0;
}else{
$arrk[] = $va;
}
}
$returnse = array_combine($arrk,$arrv);
}
}
print_r($returnse);
Upvotes: -3
Reputation: 62454
Serializing is almost always bad because you can't search it in any way. Sorry, but it seems as though you're backed into a corner...
Upvotes: -5
Reputation: 11087
I doubt anyone would write code to retrieve partially saved arrays:) I fixed a thing like this once but by hand and it took hours, and then i realized i don't need that part of the array...
Unless its really important data(and i mean REALLY important) you'd be better to leave this one go
Upvotes: -3