Reputation: 72642
This has been asked numerous times here on SO. But I haven't found a solution for my problem.
I want to create a short hash (let's say max 8 chars) for an invitation system. I cannot use base[X] encoding
because that would be too easy to guess. I cannot just trim extra characters of e.g. an MD5
hash, because I think the problem of collisions will come up at some time then.
Is there a solution for this?
Upvotes: 7
Views: 11168
Reputation: 431
If you want your invitation code to be unique (100% safe from collisions) and hard to guess at the same time, you can make it up of two parts, one being unique and the other being hard to guess. It will not be a hash per se, but it will look cryptic enough for the recipient.
// Thanks to https://stackoverflow.com/questions/4356289/php-random-string-generator
function generateRandomString($length, $characters)
{
$charactersLength = strlen($characters);
$randomString = '';
for ($i = 0; $i < $length; $i++) {
$randomString .= $characters[rand(0, $charactersLength - 1)];
}
return $randomString;
}
function generateUniqueHardToGuessCode($length, $id)
{
$allowedCharacters = '123456789ABCDEFGHJKLMNPQRSTUVWXYZ';
// exclude 0, O and I from allowed characters to avoid confusion:
// 0/O and I/l pairs can look very similar in some fonts like Arial
$uniquePart = strtoupper(base_convert($id, 10, 32));
// base_convert(.., .., 32) returns the string of the following
// 32 characters: "0123456789abcdefghijklmnopqrstuv"
// "wxy" characters are left off to replace "0OI" we want to exclude,
// "z" character will serve as a separator between random and unique
// parts to prevent situations when shorter unique part combined
// with random characters happens to match the longer unique part
// of another code, e.g.:
// ABC (unique) + DEFG (random) = ABCD (unique) + EFG (random)
$uniquePart = strtr($uniquePart, '0OI', 'WXY');
$randomPartLength = $length - strlen($uniquePart) - 1; // 1 for separator
if ($randomPartLength < 1) {
throw new Exception("The length of $length characters is not enough to create hard to guess code for ID $id");
}
$randomPart = generateRandomString($randomPartLength, $allowedCharacters);
return $randomPart . 'Z' . $uniquePart;
}
for ($id = 0; $id < 10; $id++) {
echo generateUniqueHardToGuessCode(8, $id), PHP_EOL;
}
The above snippet will output invitation codes like this:
A33UAEZW
DCBY6EZ1
985Z17Z2
REBYBTZ3
XLLRGTZ4
AEP5WBZ5
UKQNGNZ6
CTHRTXZ7
CRTAWKZ8
GJB9PXZ9
If you want them to appear even more random, including last digits, you can pre-generate a pool of them as @user984869 suggested.
Please note the exception this snippet throws when the desired code length is not enough to contain both parts. It is inevitable if we want the length to be fixed. Fixed length also makes invitation codes with longer unique parts easier to guess because of shorter random parts.
That is why I would prefer a fixed length random part and dynamically growing unique part:
function generateUniqueHardToGuessCode($randomPartLength, $id)
{
$allowedCharacters = '123456789ABCDEFGHJKLMNPQRSTUVWXYZ';
// exclude 0, O and I from allowed characters to avoid confusion:
// 0/O and I/l pairs can look very similar in some fonts like Arial
$uniquePart = strtoupper(base_convert($id, 10, 33));
// base_convert(.., .., 33) will return the string of the following
// 33 characters: "0123456789abcdefghijklmnopqrstuvw"
// "xyz" characters are left off to replace "0OI" characters
// we want to exclude.
$uniquePart = strtr($uniquePart, '0OI', 'XYZ');
$randomPart = generateRandomString($randomPartLength, $allowedCharacters);
return $randomPart . $uniquePart;
}
It makes invitation codes slowly grow in size as $id gets bigger, but throws no exceptions. It also saves an extra character by making the separator unnecessary.
Upvotes: 1
Reputation: 450
The shortest useful hash algorithm would be md5. Md5 generates 16 bytes=128 bit hash. if you use base 64 encoding, that is, 6 useful bits per byte/char.
You should be able to reduce the md5 to 22 characters (leaving the trailing padding introduced by b64).
This has an added advantage of using the same for legal filenames. You will have to substitute the default / and + characters with any other symbol which does not clash with file naming convention of your os.
Base64 (by replacing / and +) ensures your hash does not mess up the url with special characters.
Upvotes: 2
Reputation: 432
If you want to be assured of never having a collision, your best bet is to maintain a database of valid hashes and compare against that database when generating new hashes.
If you think you will have a high volume, you may want to pre-generate the hashes so that you have a "haystack" of them ready to use. Some people do this with random numbers because hardware random number generators can only produce numbers at a certain rate.
Upvotes: 2
Reputation: 50010
You can use substr on a SHA1 or MD5. The chance of a collision with a substr'd hash is the same as a hash that's designed to be the shorter length.
Or if all you really want is to generate a unique key, you can do something like this:
define('KEY_CHARS', 'acefghjkpqrstwxyz23456789'); // characters which cannot be confused phonetically or by bad handwriting
function generateKey($len = 8) {
$k = str_repeat('.', $len);
while ($len--) {
$k[$len] = substr(KEY_CHARS, mt_rand(0, strlen(KEY_CHARS) - 1), 1);
}
return $k;
}
Upvotes: 1