Hipolith
Hipolith

Reputation: 461

PHP - hash objects in a way distint object with same fields values have same hash

I am looking for a way to generate some kind of hash for PHP object (generic solution, working with all classed, built-in and custom, if possible).

SplObjectStorage::getHash is not what I'm looking for, as it will generate different hash for every instance of given class. To picture the problem, let's consider simple class:

class A() {
public $field; //public only for simplicity
}

and 2 instances of that class:

$a = new A(); $a->field = 'b';
$b = new A(); $b->field = 'b';

Every built-in function I've tried will return different hashes for these objects, while I'd like to have some function f($x) with property f($a) == f($b) => $a == $b.

I am aware I could write a function traversing all object's properties recursively until I find a property that can be casted to string, concatenate this strings in fancy way and hash, but the performance of such solution would be awful.

Is there an efficient way to do this?

Upvotes: 7

Views: 1836

Answers (3)

Headbank
Headbank

Reputation: 400

The question, in effect, asks for two things that may be at odds with one another.

  1. A method of hashing any object in a consistent (and performant) manner.
  2. Efficient comparison of objects using this hashing method.

First, the hash. Others have suggested serialize() for performance reasons but it does introduce one limitation: PHP objects can have fields added externally as well as those declared in the class. So it's possible (although unlikely and certainly indicative of questionable coding practice) that your objects could have the same fields, but declared in different order. This would produce different serializations, which by the wording of your question you would not want.

To guard against this, you would need to cast the object to an array and sort its members. In case any fields are themselves objects or arrays that may have the same issue, you should work recursively.

function sortObject($obj) {
  $arr = (array) $obj;
  ksort($arr);
  foreach($arr as $k => $v) {
    if(is_array($v) || is_object($v)) {
      $arr[$k] = sortObject($v);
    }
  }
  return $arr;
}

This provides a consistent representation of the object that can be serialized and hashed. Alternatively you can actually build the hash within the function itself:

function hashObject($obj) {
  $arr = (array) $obj;
  ksort($arr);
  $hash = '';
  foreach($arr as $k => $v) {
    if(is_array($v)) {
      $hash .= '['.hashObject($v).']';
    elseif(is_object($v)) {
      $hash .= '{'.hashObject($v).'}';
    } else {
      $hash .= var_export($v);
    }
  }
  return $hash;
}
//The brackets are added to preserve structure.

json_encode() could be used in place of var_export() but I chose the latter to guarantee faithful representation of PHP values (collisions might be possible in JSON, I dunno) and I suspect it might perform better.

Then again, what if an object contains circular references, e.g. it has an object or array field containing a value that is a reference back to it? serialize() can handle that; the functions above cannot.

Now: comparing. The best approach to this is governed by what the answer is most likely to be. That is to say:

If you expect most of the pairs of objects you are comparing to differ, then it will be more efficient to compare them piece-by-piece, so you can establish difference as soon as possible.

function matchObj($a,$b) {
  if(gettype($a) !== gettype($b)) {
    return false;
  }
  $arrA = (array) $a;
  $arrB = (array) $b;
  if(count($arrA) <> count($arrB)) {
    return false;
  }
  ksort($arrA);
  ksort($arrB);
  foreach($arrA as $k => $v) {
    if($k !== key($arrB) || gettype($v) !== gettype($arrB[$k]) {
      return false;
    }
    if(is_array($v) || is_object($v)) {
      matchObj($v,$arrB[$k]) || return false;
    } elseif($v !== $arrB[$k]) {
      return false;
    }
    next($arrB);
  }
  return true;
}

If you expect most of the pairs to match, then you may as well hash each object in full and compare those (which is likely to work out more efficient if using serialize() than your own recursive function like the above) because you won't have much of a shortcut to the answer in each instance anyway.

Upvotes: 0

Evadecaptcha
Evadecaptcha

Reputation: 1441

Assuming I understand you correctly, you could serialize the objects then md5 the serialized object. Since the serialization creates the same string if all properties are the same, you should get the same hash every time. Unless your object has some kind of timestamp property. Example:

class A {
    public $field;
}
$a = new A;
$b = new A;
$a->field = 'test';
$b->field = 'test';
echo md5(serialize($a)) . "\n";
echo md5(serialize($b)) . "\n";

output:

0a0a68371e44a55cfdeabb04e61b70f7
0a0a68371e44a55cfdeabb04e61b70f7

Yours are coming out differently because the object in php memory is stored with a numbered id of each instantiation:

object(A)#1 (1) {...
object(A)#2 (1) {...

Upvotes: 5

Alister Bulman
Alister Bulman

Reputation: 35169

You appear to be talking about a Value Object. It is a pattern where each such object isn't compared according to the object identity, but about the contents - fully, or partially, of the properties that make up the object.

I'm using a number of them in a project:

public function equals(EmailAddress $address)
{
    return strtolower($this->address) === strtolower((string) $address);
}

A more complex object could simply add more items into the comparison function.

return ($this->one === $address->getOne() && 
    $this->two === $address->getTwo());

As such conditionals (all joined with '&&') will short-cut to false as soon as any item does not match.

Upvotes: 0

Related Questions