caw
caw

Reputation: 31489

Represent MD5 hash as an integer

In my user database table, I take the MD5 hash of the email address of a user as the id.

Example: email([email protected]) = id(d41d8cd98f00b204e9800998ecf8427e)

Unfortunately, I have to represent the ids as integer values now - in order to be able to use an API where the id can only be an integer.

Now I'm looking for a way to encode the id into an integer for sending an decode it again when receiving. How could I do this?

My ideas so far:

  1. convert_uuencode() and convert_uudecode() for the MD5 hash
  2. replace every character of the MD5 hash by its ord() value

Which approach is better? Do you know even better ways to do this?

I hope you can help me. Thank you very much in advance!

Upvotes: 20

Views: 63110

Answers (9)

codebard
codebard

Reputation: 146

A simple solution could use hexdec() for conversions for parts of the hash.

Systems that can accommodate 64-bit Ints can split the 128-bit/16-byte md5() hash into four 4-byte sections and then convert each into representations of unsigned 32-bit Ints. Each hex pair represents 1 byte, so use 8 character chunks:

$hash = md5($value);

foreach (str_split($hash, 8) as $chunk) {
    $int_hashes[] = hexdec($chunk);
}

On the other end, use dechex() to convert the values back:

foreach ($int_hashes as $ihash) {
    $original_hash .= dechex($ihash);
}

Caveat: Due to underlying deficiencies with how PHP handles integers and how it implements hexdec() and intval(), this strategy will not work with 32-bit systems.

Edit Takeaways:

  • Ints in PHP are always signed, there are no unsigned Ints.

  • Although intval() may be useful for certain cases, hexdec() is more performant and more simple to use for base-16.

  • hexdec() converts values above 7fffffffffffffff into Floats, making its use moot for splitting the hash into two 64-bit/8-byte chunks.

  • Similarly for intval($chunk, 16), it returns the same Int value for 7fffffffffffffff and above.

Upvotes: 10

theking2
theking2

Reputation: 2803

Add these two columns to your table.

`email_md5_l` bigint(20) UNSIGNED GENERATED ALWAYS AS (conv(left(md5(`email`),16),16,10)) STORED,
`email_md5_r` bigint(20) UNSIGNED GENERATED ALWAYS AS (conv(right(md5(`email`),16),16,10)) STORED,

It might or might not help to create a PK on these two columns though, as it probably concatenates two string representations and hashes the result. It would kind of defeat your purpose and a full scan might be quicker but that depends on number of columns and records. Don't try to read these bigints in php as it doesn't have unsigned integers, just stay in SQL and do something like:

select email 
into result 
from `address`
where url_md5_l = conv(left(md5(the_email), 16), 16, 10)
  and url_md5_r = conv(right(md5(the_email), 16), 16, 10) 
limit 1;

MD5 does collide btw.

Upvotes: 0

Marcin
Marcin

Reputation: 5579

what about:

$float = hexdec(md5('string'));

or

$int = (integer) (substr(hexdec(md5('string')),0,9)*100000000);

Definitely bigger chances for collision but still good enaugh to use instead of hash in DB though?

Upvotes: 1

humbads
humbads

Reputation: 3412

Use the email address as the file name of a blank, temporary file in a shared folder, like /var/myprocess/[email protected]

Then, call ftok on the file name. ftok will return a unique, integer ID.

It won't be guaranteed to be unique though, but it will probably suffice for your API.

Upvotes: -1

bdonlan
bdonlan

Reputation: 231103

Be careful. Converting the MD5s to an integer will require support for big (128-bit) integers. Chances are the API you're using will only support 32-bit integers - or worse, might be dealing with the number in floating-point. Either way, your ID will get munged. If this is the case, just assigning a second ID arbitrarily is a much better way to deal with things than trying to convert the MD5 into an integer.

However, if you are sure that the API can deal with arbitrarily large integers without trouble, you can just convert the MD5 from hexadecimal to an integer. PHP most likely does not support this built-in however, as it will try to represent it as either a 32-bit integer or a floating point; you'll probably need to use the PHP GMP library for it.

Upvotes: 23

GZipp
GZipp

Reputation: 5416

There are good reasons, stated by others, for doing it a different way.

But if what you want to do is convert an md5 hash into a string of decimal digits (which is what I think you really mean by "represent by an integer", since an md5 is already an integer in string form), and transform it back into the same md5 string:

function md5_hex_to_dec($hex_str)
{
    $arr = str_split($hex_str, 4);
    foreach ($arr as $grp) {
        $dec[] = str_pad(hexdec($grp), 5, '0', STR_PAD_LEFT);
    }
    return implode('', $dec);
}

function md5_dec_to_hex($dec_str)
{
    $arr = str_split($dec_str, 5);
    foreach ($arr as $grp) {
        $hex[] = str_pad(dechex($grp), 4, '0', STR_PAD_LEFT);
    }
    return implode('', $hex);
}

Demo:

$md5 = md5('[email protected]');
echo $md5 . '<br />';  // 23463b99b62a72f26ed677cc556c44e8
$dec = md5_hex_to_dec($md5);
echo $dec . '<br />';  // 0903015257466342942628374306682186817640
$hex = md5_dec_to_hex($dec);
echo $hex;             // 23463b99b62a72f26ed677cc556c44e8

Of course, you'd have to be careful using either string, like making sure to use them only as string type to avoid losing leading zeros, ensuring the strings are the correct lengths, etc.

Upvotes: 10

Alexey Sviridov
Alexey Sviridov

Reputation: 3490

Why ord()? md5 produce normal 16-byte value, presented to you in hex for better readability. So you can't convert 16-byte value to 4 or 8 byte integer without loss. You must change some part of your algoritms to use this as id.

Upvotes: 2

SeanJA
SeanJA

Reputation: 10354

Couldn't you just add another field that was an auto-increment int field?

Upvotes: 1

Malax
Malax

Reputation: 9604

You could use hexdec to parse the hexadecimal string and store the number in the database.

Upvotes: 1

Related Questions