Reputation: 57326
I've worked with PHP for a few years now, but up until now never had a need to deal with serialisation explicitly, only using the $_SESSION
. Now I have a project that requires me to manually implement serialisation mechanism for certain data - and I realise that the issue is applicable to $_SESSION
as well.
I have a class that contains a number of properties. Most of these properties are small (as in memory consumption): numbers, relatively short strings, etc. However the class also contains some properties, which may contain HUGE arrays (e.g. an entire dump of a database table: 100,000 rows with 100 fields each). As it happens, this is one of the classes that needs to be serialised/deserialised - and, luckly, the properties containing large arrays don't need to be serialised, as they are essentially temporary pieces of work and are rebuilt anyway as necessary.
In such circumstances in Java, I would simply declare the property as transient
- and it would be omitted from serialisaion. Unfortunately, PHP doesn't support such qualifiers.
One way to deal with is it to have something like this:
class A implements Serializable
{
private $var_small = 1234;
private $var_big = array( ... ); //huge array, of course, not init in this way
public function serialize()
{
$vars = get_object_vars($this);
unset($vars['var_big']);
return serialize($vars);
}
public function unserialize($data)
{
$vars = unserialize($data);
foreach ($vars as $var => $value) {
$this->$var = $value;
}
}
}
However this is rather cumbersome, as I would need to update serialize
method every time I add another transient property. Also, once the inheritance comes into play, this becomes even more complicated - to deal with, as transient properties may be in both subclass and the parent. I know, it's still doable, however I would prefer to delegate as much as possible to the language rather than reinvent the wheel.
So, what's the best way to deal with transient properties? Or am I missing something and PHP supports this out of the box?
Upvotes: 9
Views: 4693
Reputation: 919
Disclaimer: If no inheritance is involved or all properties are public or protected, you can use one of many solutions provided before. The solution discussed here is designed to work with inheritance and private properties. It's specially useful to remove injected dependencies.
Basically use __sleep()
to exclude properties from serialization.
But we need a way to extract all property names of $this
. Use __wakeup()
to re-establish those lost connections/data.
If __sleep()
is present in your class and __serialize()
is not, serialize()
uses __sleep()
to grab a list of properties which should be serialized. The list is one-dimensional but has to be a specific format to
determine which private property belongs to which class. This is the format, where \0
are null chars:
[
"publicProperty",
"\0*\0protectedProperty",
"\0ClassName\0privateProperty",
]
To bring the property list into the correct format, we found two solutions.
Note: __sleep()
only needs to be implemented on the parent class.
(array)
castThe array cast results in an array with all properties of an object and with the correct keys.
Note: This solution is still quite error-prone especially during refactoring, since excluded property names are hardcoded as strings.
class ParentClass {
private $parentProperty1;
private $parentProperty2;
public function __construct() {
$this->parentProperty1 = 'Parent Property 1';
$this->parentProperty2 = 'Parent Property 2';
}
public function __sleep() {
$excludedProperties = [
"\0ParentClass\0parentProperty1",
];
$properties = (array)$this;
return array_filter(array_keys($properties), function ($propertyName) use ($excludedProperties) {
return !in_array($propertyName, $excludedProperties);
});
}
}
class ChildClass extends ParentClass {
private $childProperty3;
public function __construct()
{
parent::__construct();
$this->childProperty3 = 'Child Property 3';
}
}
$child = new ChildClass();
var_dump($child);
$serialized = serialize($child);
var_dump($serialized);
$child = unserialize($serialized);
var_dump($child);
Result
object(ChildClass)#1 (3) {
["parentProperty1":"ParentClass":private] => string(17) "Parent Property 1"
["parentProperty2":"ParentClass":private] => string(17) "Parent Property 2"
["childProperty3":"ChildClass":private] => string(16) "Child Property 3"
}
string(141) "O:10:"ChildClass":2:{s:28:" ParentClass parentProperty2";s:17:"Parent Property 2";s:26:" ChildClass childProperty3";s:16:"Child Property 3";}"
object(ChildClass)#2 (3) {
["parentProperty1":"ParentClass":private] => NULL
["parentProperty2":"ParentClass":private] => string(17) "Parent Property 2"
["childProperty3":"ChildClass":private] => string(16) "Child Property 3"
}
Trade speed for elegance and readability. Using Reflections is about 2x slower according to some simple, not-representative benchmarks listed below. The serialization of a complex object in our production code takes about 60 microseconds (which isn't representative either), just so you have a baseline.
The reflection loops over all properties of this class and all parent classes. It checks if the property is private and builds the property names accordingly.
#[Attribute]
class DoNotSerialize {}
class ParentClass {
private $parentProperty1;
#[DoNotSerialize]
private $parentProperty2;
public function __construct(
#[DoNotSerialize] private $parentProperty3,
) {
$this->parentProperty1 = 'Parent Property 1';
$this->parentProperty2 = 'Parent Property 2';
}
public function __sleep() {
$props = [];
$reflectionClass = new ReflectionClass($this);
do {
$reflectionProps = $reflectionClass->getProperties();
foreach ($reflectionProps as $reflectionProp) {
// $reflectionProp->setAccessible(true); // not needed after PHP 8.1
if (empty($reflectionProp->getAttributes(DoNotSerialize::class)) && !$reflectionProp->isStatic()) {
$propertyName = $reflectionProp->getName();
// PHP uses NUL-byte prefixes to represent visibility in property names
if ($reflectionProp->isPrivate()) {
$propertyName = "\0" . $reflectionProp->getDeclaringClass()->getName() . "\0" . $propertyName;
} elseif ($reflectionProp->isProtected()) {
$propertyName = "\0*\0" . $propertyName;
}
$props[] = $propertyName;
}
}
$reflectionClass = $reflectionClass->getParentClass();
} while ($reflectionClass);
return $props;
}
}
class ChildClass extends ParentClass {
private $childProperty4;
public function __construct()
{
parent::__construct('Parent Property 3');
$this->childProperty4 = 'Child Property 4';
}
}
$child = new ChildClass();
var_dump($child);
$serialized = serialize($child);
var_dump($serialized);
$child = unserialize($serialized);
var_dump($child);
Result
object(ChildClass)#1 (4) {
["parentProperty1":"ParentClass":private] => string(17) "Parent Property 1"
["parentProperty2":"ParentClass":private] => string(17) "Parent Property 2"
["parentProperty3":"ParentClass":private] => string(17) "Parent Property 3"
["childProperty4":"ChildClass":private] => string(16) "Child Property 4"
}
string(141) "O:10:"ChildClass":2:{s:26:" ChildClass childProperty4";s:16:"Child Property 4";s:28:" ParentClass parentProperty1";s:17:"Parent Property 1";}"
object(ChildClass)#6 (4) {
["parentProperty1":"ParentClass":private] => string(17) "Parent Property 1"
["parentProperty2":"ParentClass":private] => NULL
["parentProperty3":"ParentClass":private] => NULL
["childProperty4":"ChildClass":private] => string(16) "Child Property 4"
}
(array)
cast$start = microtime(true);
for ($i = 0; $i < 1000000; $i++) {
$child = new ChildClass();
$serializedArray = serialize($child);
}
$end = microtime(true);
$timeArray = ($end - $start) * 1000;
echo "Time: $timeArray ms" . PHP_EOL;
Result: Time: 587.77904510498 ms
$start = microtime(true);
for ($i = 0; $i < 1000000; $i++) {
$child = new ChildClass();
$serializedReflection = serialize($child);
}
$end = microtime(true);
$timeReflection = ($end - $start) * 1000;
echo "Time: $timeReflection ms" . PHP_EOL;
Result: Time: 1218.0068492889 ms
get_object_vars()
get_object_vars()
returns an associative array with all properties of an object in scope. This is a problem because it breaks
serialization of classes with inherited private properties.
__serialize()
and __unserialize()
This seems to be the recommended approach. However, it comes with a large programming overhead. You would have to implement this
behaviour on every class in your hierarchy. Plus we want PHP to do the serialization, so we don't have to __unserialize()
manually.
Example grabbed from PHP RFC: New custom object serialization mechanism
class A {
private $prop_a;
public function __serialize(): array {
return ['prop_a' => $this->prop_a];
}
public function __unserialize(array $data) {
$this->prop_a = $data['prop_a'];
}
}
class B extends A {
private $prop_b;
public function __serialize(): array {
return [
'prop_b' => $this->prop_b,
'parent_data' => parent::__serialize(),
];
}
public function __unserialize(array $data) {
parent::__unserialize($data['parent_data']);
$this->prop_b = $data['prop_b'];
}
}
Upvotes: 0
Reputation: 21007
Php provides __sleep magic method which allows you to choose what attributes are to be serialized.
EDIT I've tested how does __sleep()
work when inheritance is in the game:
<?php
class A {
private $a = 'String a';
private $b = 'String b';
public function __sleep() {
echo "Sleep A\n";
return array( 'a');
}
}
class B extends A {
private $c = 'String c';
private $d = 'String d';
public function __sleep() {
echo "Sleep B\n";
return array( 'c');
}
}
class C extends A {
private $e = 'String e';
private $f = 'String f';
public function __sleep() {
echo "Sleep C\n";
return array_merge( parent::__sleep(), array( 'e'));
}
}
$a = new A();
$b = new B();
$c = new C();
echo serialize( $a) ."\n"; // Result: O:1:"A":1:{s:4:"Aa";s:8:"String a";}
// called "Sleep A" (correct)
echo serialize( $b) ."\n"; // Result: O:1:"B":1:{s:4:"Bc";s:8:"String c";}
// called just "Sleep B" (incorrect)
echo serialize( $c) ."\n"; // Caused: PHP Notice: serialize(): "a" returned as member variable from __sleep() but does not exist ...
// When you declare `private $a` as `protected $a` that class C returns:
// O:1:"C":2:{s:4:"*a";s:8:"String a";s:4:"Ce";s:8:"String e";}
// which is correct and called are both: "Sleep C" and "Sleep A"
So it seems that you can serialize parent data only if it's declared as protected :-/
EDIT 2 I've tried it with Serializable
interface with following code:
<?php
class A implements Serializable {
private $a = '';
private $b = '';
// Just initialize strings outside default values
public function __construct(){
$this->a = 'String a';
$this->b = 'String b';
}
public function serialize() {
return serialize( array( 'a' => $this->a));
}
public function unserialize( $data){
$array = unserialize( $data);
$this->a = $array['a'];
}
}
class B extends A {
private $c = '';
private $d = '';
// Just initialize strings outside default values
public function __construct(){
$this->c = 'String c';
$this->d = 'String d';
parent::__construct();
}
public function serialize() {
return serialize( array( 'c' => $this->c, '__parent' => parent::serialize()));
}
public function unserialize( $data){
$array = unserialize( $data);
$this->c = $array['c'];
parent::unserialize( $array['__parent']);
}
}
$a = new A();
$b = new B();
echo serialize( $a) ."\n";
echo serialize( $b) ."\n";
$a = unserialize( serialize( $a)); // C:1:"A":29:{a:1:{s:1:"a";s:8:"String a";}}
$b = unserialize( serialize( $b)); // C:1:"B":81:{a:2:{s:1:"c";s:8:"String c";s:8:"__parent";s:29:"a:1:{s:1:"a";s:8:"String a";}";}}
print_r( $a);
print_r( $b);
/** Results:
A Object
(
[a:A:private] => String a
[b:A:private] =>
)
B Object
(
[c:B:private] => String c
[d:B:private] =>
[a:A:private] => String a
[b:A:private] =>
)
*/
So to sum up: you can serialize classes via __sleep()
only if they don't have private members in super class (which need to be serialized as well). You can serialize complex object via implementing Serializable
interface, but it brings you some programming overhead.
Upvotes: 7
Reputation: 191779
You can use __sleep
and __wakeup
. For the former, you provide an array of the names of object properties you want serialized. Omit "transient" members from this list.
__wakeup
is called immediately when an instance is unserialized. You could use this to, for example, refill the non-transient properties on some conditions.
Upvotes: 0