Reputation: 305
I have some points of clarification based on reading the internal implementations for zvals described here Internal value representation in PHP 7 - Part 1 and Internal value representation in PHP 7 - Part 2.
Before explaining my confusion in detail, I think it can be summed up by:
(i) I do not see why object reference counting differs from that of arrays and strings
(ii) why reference counting for strings differs from that of arrays.
From what I understand, for "complex" data types like strings, arrays, and objects, those entities are reference counted types (in PHP 7.4).
1 So based on this simplistic view, I would imagine the reference counting for arrays, strings, and objects in PHP are the same, but it appears not to be the case. Can someone explain where this (apparently) overly simplistic view falls apart?
I used the 'debug_zval_dump' function to count references (I do not have privileges to install xdebug to use 'xdebug_debug_zval'). In my subsequent analysis, I am assuming that the function's own parameter can influence the reference counts displayed.
2 For objects, the following counts make sense to me, but it would be great to have confirmation.
class Foo{}
$a = new Foo();
debug_zval_dump($a); // #1: $a refcount = 2 -- and assume this is run after each line of code for each variable
$b = $a; // #2: $a, $b refcount = 3
$c = &$a; // #3: $a, $b, and $c refcount = 3
$a = new Foo(); // #4 $a, $b, and $c refcount = 2
Just to be clear: every time a variable is defined or updated, I pass all existing variables to debug_zval_dump in a PHP 7.4 engine. That's what the refcounts refer to. I'm just saving space.
#1: There are two _zval_structs pointing to the object (one is the function parameter).
#2: There are three _zval_structs pointing to the object (counting the function parameter)
#3: The function parameter, $b, and a zend_reference (shared by $a and $c) point to the object
#4: $a and $c refer to the same object through a common zend_reference (plus the function parameter) and $b and the function parameter refer to the same object.
Is this counting wrong? Please correct me if so. Otherwise, we move to the more confusing items:
3 Arrays:
$a=[]; // #1: $a refcount = 1
$b=$a; // #2: $a, $b refcount = 1
$c=&$a; // #3: $a, $b, and $c refcount = 1
$a[]=0; // #4: $a and $c refcount = 2, $b refcount = 1
I would expect the same numbers as for 2.#1-#4 and that's not what what we get. This appears to be at odds with the PHP article linked to as, I would expect something more like the following at #4:
$a, $c -> zend_reference1(refcount=2) -> zend_array2(refcount=2,value=[0])
$parameter -------------------------------^
$b, $parameter -> zend_array1(refcount=2,value=[])
4 There is then yet different counting for strings.
$a=''; // #1: $a refcount = 1
$b=$a; // #2: $a, $b refcount = 1
$c=&$a; // #3: $a, $b, and $c refcount = 1
$a='foo'; // #4: $a, $b, and $c refcount = 1
I would have the same diagram here as for 3.
What details am I overlooking for this reference counting?
5 As a bonus, what happens when a reference is made to a number now? For example
$a=0;
$b=&$a; // $a, b -> zend_reference(refcount=2) -> zend_value(value=0)
Is the comment diagram correct, assuming that zend_value is stack based (since numeric values are not reference counted)?
Upvotes: 0
Views: 302
Reputation: 97898
So based on this simplistic view, I would imagine the reference counting for arrays, strings, and objects in PHP are the same, but it appears not to be the case. Can someone explain where this (apparently) overly simplistic view falls apart?
It falls apart because the way variables are represented is based not just on their type, but how they're defined and used. In particular, copy-on-write isn't always the most efficient way to handle a simple value, and the compiler can perform other optimisations instead.
This is hard to see with debug_zval_dump
, because passing a variable to the function changes its representation, and because there are details it doesn't show. Using the xdebug_debug_zval
function provided by Xdebug instead, we get a bit more information...
For the object case, it's fairly straight-forward:
class Foo{}
$a = new Foo();
// a: (refcount=1, is_ref=0)=class Foo { }
$b = $a;
// a: (refcount=2, is_ref=0)=class Foo { } // b: (refcount=2, is_ref=0)=class Foo { }
$c = &$a;
// a: (refcount=2, is_ref=1)=class Foo { }
// b: (refcount=2, is_ref=0)=class Foo { }
// c: (refcount=2, is_ref=1)=class Foo { }
$a = new Foo();
// a: (refcount=2, is_ref=1)=class Foo { }
// b: (refcount=1, is_ref=0)=class Foo { }
// c: (refcount=2, is_ref=1)=class Foo { }
$a
and $c
point at the same IS_REFERENCE
zval (refcount=2); that zval and $b
both point at the same object (refcount=2).
Now let's look at the array case:
$a=[];
// a: (immutable, is_ref=0)=array ()
$b=$a;
// a: (immutable, is_ref=0)=array ()
// b: (immutable, is_ref=0)=array ()
$c=&$a;
// a: (refcount=2, is_ref=1)=array ()
// b: (immutable, is_ref=0)=array ()
// c: (refcount=2, is_ref=1)=array ()
$a[]=0;
// a: (refcount=2, is_ref=1)=array (0 => (refcount=0, is_ref=0)=0)
// b: (immutable, is_ref=0)=array ()
// c: (refcount=2, is_ref=1)=array (0 => (refcount=0, is_ref=0)=0)
The empty arrays don't show a refcount at all, they show "immutable". Empty arrays are common, and interchangeable, so a special case avoids allocating lots of separate zvals with the same content.
If we change the array to not be empty, we get something different:
$a=[42];
// a: (refcount=2, is_ref=0)=array (0 => (refcount=0, is_ref=0)=42)
$b=$a;
// a: (refcount=3, is_ref=0)=array (0 => (refcount=0, is_ref=0)=42)
// b: (refcount=3, is_ref=0)=array (0 => (refcount=0, is_ref=0)=42)
$c=&$a;
// a: (refcount=2, is_ref=1)=array (0 => (refcount=0, is_ref=0)=42)
// b: (refcount=3, is_ref=0)=array (0 => (refcount=0, is_ref=0)=42)
// c: (refcount=2, is_ref=1)=array (0 => (refcount=0, is_ref=0)=42)
$a[]=0;
// a: (refcount=2, is_ref=1)=array (0 => (refcount=0, is_ref=0)=42, 1 => (refcount=0, is_ref=0)=0)
// b: (refcount=2, is_ref=0)=array (0 => (refcount=0, is_ref=0)=42)
// c: (refcount=2, is_ref=1)=array (0 => (refcount=0, is_ref=0)=42, 1 => (refcount=0, is_ref=0)=0)
The "immutable" has gone, but something's odd: the refcount starts at 2 even when we only assign one variable. This is the influence of a "Compiled Variable": the compiler has pre-allocated the zval containing [42]
, so needs to manage the memory for it. To avoid the normal memory management freeing it too soon, it adds an extra counted reference to the zval.
To defeat that optimization, let's create an array that can only be created at run-time:
$a=[rand()];
// a: (refcount=1, is_ref=0)=array (0 => (refcount=0, is_ref=0)=713417292)
$b=$a;
// a: (refcount=2, is_ref=0)=array (0 => (refcount=0, is_ref=0)=713417292)
// b: (refcount=2, is_ref=0)=array (0 => (refcount=0, is_ref=0)=713417292)
$c=&$a;
// a: (refcount=2, is_ref=1)=array (0 => (refcount=0, is_ref=0)=713417292)
// b: (refcount=2, is_ref=0)=array (0 => (refcount=0, is_ref=0)=713417292)
// c: (refcount=2, is_ref=1)=array (0 => (refcount=0, is_ref=0)=713417292)
$a[]=0;
// a: (refcount=2, is_ref=1)=array (0 => (refcount=0, is_ref=0)=713417292, 1 => (refcount=0, is_ref=0)=0)
// b: (refcount=1, is_ref=0)=array (0 => (refcount=0, is_ref=0)=713417292)
// c: (refcount=2, is_ref=1)=array (0 => (refcount=0, is_ref=0)=713417292, 1 => (refcount=0, is_ref=0)=0)
Finally, things look more like the object case!
On to strings...
$a='';
// a: (interned, is_ref=0)=''
$b=$a;
// a: (interned, is_ref=0)=''
// b: (interned, is_ref=0)=''
$c=&$a;
// a: (refcount=2, is_ref=1)=''
// b: (interned, is_ref=0)=''
// c: (refcount=2, is_ref=1)=''
$a='foo';
// a: (refcount=2, is_ref=1)='foo'
// b: (interned, is_ref=0)=''
// c: (refcount=2, is_ref=1)='foo'
Like the empty array, the empty string isn't showing a refcount, it's showing "interned". Again, the compiler has decided not to allocate a new zval and to use some shared memory instead. ('foo'
is probably also interned, but Xdebug is showing us the IS_REFERENCE
zval pointing to it.)
Let's pick something non-empty instead:
$a='hello';
// a: (interned, is_ref=0)='hello'
$b=$a;
// a: (interned, is_ref=0)='hello'
// b: (interned, is_ref=0)='hello'
$c=&$a;
// a: (refcount=2, is_ref=1)='hello'
// b: (interned, is_ref=0)='hello'
// c: (refcount=2, is_ref=1)='hello'
$a='foo';
// a: (refcount=2, is_ref=1)='foo'
// b: (interned, is_ref=0)='hello'
// c: (refcount=2, is_ref=1)='foo'
Unlike the non-empty array, this hasn't made any difference, the compiler can "intern" any constant string it sees in the source code.
So we need to defeat the optimization again:
$a=(string)rand();
// a: (refcount=1, is_ref=0)='522057011'
$b=$a;
// a: (refcount=2, is_ref=0)='522057011'
// b: (refcount=2, is_ref=0)='522057011'
$c=&$a;
// a: (refcount=2, is_ref=1)='522057011'
// b: (refcount=2, is_ref=0)='522057011'
// c: (refcount=2, is_ref=1)='522057011'
$a='foo';
// a: (refcount=2, is_ref=1)='foo'
// b: (refcount=1, is_ref=0)='522057011'
// c: (refcount=2, is_ref=1)='foo'
Once again, it matches the object example!
This is just a sampling of the optimizations involved in the current engine, and more will probably be added in future versions (the above was tested on PHP 7.4, 8.0, and 8.1).
Upvotes: 1