Reputation: 461
I am looking for a way to generate some kind of hash for PHP object (generic solution, working with all classed, built-in and custom, if possible).
SplObjectStorage::getHash is not what I'm looking for, as it will generate different hash for every instance of given class. To picture the problem, let's consider simple class:
class A() {
public $field; //public only for simplicity
}
and 2 instances of that class:
$a = new A(); $a->field = 'b';
$b = new A(); $b->field = 'b';
Every built-in function I've tried will return different hashes for these objects, while I'd like to have some function f($x)
with property f($a) == f($b) => $a == $b
.
I am aware I could write a function traversing all object's properties recursively until I find a property that can be casted to string, concatenate this strings in fancy way and hash, but the performance of such solution would be awful.
Is there an efficient way to do this?
Upvotes: 7
Views: 1836
Reputation: 400
The question, in effect, asks for two things that may be at odds with one another.
First, the hash. Others have suggested serialize()
for performance reasons but it does introduce one limitation: PHP objects can have fields added externally as well as those declared in the class. So it's possible (although unlikely and certainly indicative of questionable coding practice) that your objects could have the same fields, but declared in different order. This would produce different serializations, which by the wording of your question you would not want.
To guard against this, you would need to cast the object to an array and sort its members. In case any fields are themselves objects or arrays that may have the same issue, you should work recursively.
function sortObject($obj) {
$arr = (array) $obj;
ksort($arr);
foreach($arr as $k => $v) {
if(is_array($v) || is_object($v)) {
$arr[$k] = sortObject($v);
}
}
return $arr;
}
This provides a consistent representation of the object that can be serialized and hashed. Alternatively you can actually build the hash within the function itself:
function hashObject($obj) {
$arr = (array) $obj;
ksort($arr);
$hash = '';
foreach($arr as $k => $v) {
if(is_array($v)) {
$hash .= '['.hashObject($v).']';
elseif(is_object($v)) {
$hash .= '{'.hashObject($v).'}';
} else {
$hash .= var_export($v);
}
}
return $hash;
}
//The brackets are added to preserve structure.
json_encode()
could be used in place of var_export()
but I chose the latter to guarantee faithful representation of PHP values (collisions might be possible in JSON, I dunno) and I suspect it might perform better.
Then again, what if an object contains circular references, e.g. it has an object or array field containing a value that is a reference back to it? serialize()
can handle that; the functions above cannot.
Now: comparing. The best approach to this is governed by what the answer is most likely to be. That is to say:
If you expect most of the pairs of objects you are comparing to differ, then it will be more efficient to compare them piece-by-piece, so you can establish difference as soon as possible.
function matchObj($a,$b) {
if(gettype($a) !== gettype($b)) {
return false;
}
$arrA = (array) $a;
$arrB = (array) $b;
if(count($arrA) <> count($arrB)) {
return false;
}
ksort($arrA);
ksort($arrB);
foreach($arrA as $k => $v) {
if($k !== key($arrB) || gettype($v) !== gettype($arrB[$k]) {
return false;
}
if(is_array($v) || is_object($v)) {
matchObj($v,$arrB[$k]) || return false;
} elseif($v !== $arrB[$k]) {
return false;
}
next($arrB);
}
return true;
}
If you expect most of the pairs to match, then you may as well hash each object in full and compare those (which is likely to work out more efficient if using serialize()
than your own recursive function like the above) because you won't have much of a shortcut to the answer in each instance anyway.
Upvotes: 0
Reputation: 1441
Assuming I understand you correctly, you could serialize the objects then md5 the serialized object. Since the serialization creates the same string if all properties are the same, you should get the same hash every time. Unless your object has some kind of timestamp property. Example:
class A {
public $field;
}
$a = new A;
$b = new A;
$a->field = 'test';
$b->field = 'test';
echo md5(serialize($a)) . "\n";
echo md5(serialize($b)) . "\n";
output:
0a0a68371e44a55cfdeabb04e61b70f7
0a0a68371e44a55cfdeabb04e61b70f7
Yours are coming out differently because the object in php memory is stored with a numbered id of each instantiation:
object(A)#1 (1) {...
object(A)#2 (1) {...
Upvotes: 5
Reputation: 35169
You appear to be talking about a Value Object. It is a pattern where each such object isn't compared according to the object identity, but about the contents - fully, or partially, of the properties that make up the object.
I'm using a number of them in a project:
public function equals(EmailAddress $address)
{
return strtolower($this->address) === strtolower((string) $address);
}
A more complex object could simply add more items into the comparison function.
return ($this->one === $address->getOne() &&
$this->two === $address->getTwo());
As such conditionals (all joined with '&&') will short-cut to false as soon as any item does not match.
Upvotes: 0