Reputation: 22941
I'd like to generate a unique identifier based on the content of an array. My initial approach was to simply do:
$key = md5(json_encode($array));
However, I'd like to be absolutely sure that the key is unique and there is a remote possibility that two distinct arrays can produce the same md5 hash. Current idea is to do:
$key = base64_encode(json_encode($array));
This is guaranteed to be unique but produces quite a long key. Can I use sha512 or does this type of hash also have the same potential for key collision as md5? Is there any way to generate a shorter key than the base64 method which is 100% guaranteed to be unique?
To be 100% clear, my question is: How can I generate the shortest possible 100% unique identifier for a set of data?
Upvotes: 3
Views: 2582
Reputation: 3549
If you want a 100% guaranteed unique key to match your content, then the only way is to use the full length of your content. You can use the json_encoded string as-is, or you could run it through base64_encode() or bin2hex() or similar if you want a string that doesn't have any "special" characters. Any hash function like md5, sha1, sha256 etc obviously cannot be 100% unique - because they have a fixed length, and due to the https://en.wikipedia.org/wiki/Pigeonhole_principle there must necessarily be non-unique results for input content that is larger than the hash.
In practice, md5 and sha1 collisions have now been published, but stronger hash functions exist where no collisions are known or expected for a long time, so you could also look into using a modern hash algorithm and be fairly safe that you will not have any duplicates.
Upvotes: 7