Craig Harp
Craig Harp

Reputation: 43

Using this custom hashing function, is there any obvious security flaws I am missing?

<?php
function md7($text, $len)
{
  if($text)
  {
   $split = str_split(md5($text).md5(strlen($text)).md5($len), 5);
    foreach($split as $key=>$value)
    {
      $md5 = $md5.md5($value);
    }
    $split2 = str_split($md5);
    foreach($split2 as $kl=>$vl)
    {
      if($kl < $len)
      {
        $digest = $digest.$split2[$kl];
      }   
    }
    return $digest;
  }
} // end md7 function
?>

I created this function to make use of md5 and a variable length hash, I believe this reduces the chances of a collision, I have tested the examples of an MD5 collision and it does not create a collision for this function, and this function I believe is not susceptible to rainbow table attacks.

Upvotes: 3

Views: 943

Answers (3)

Gumbo
Gumbo

Reputation: 655677

Besides that your function does not return anything for "" and "0", the final hash value does only consist of the MD5 hash values of the hexadecimal characters 09, and af, which are:

string(32) "cfcd208495d565ef66e7dff9f98764da"
string(32) "c4ca4238a0b923820dcc509a6f75849b"
string(32) "c81e728d9d4c2f636f067f89cc14862c"
string(32) "eccbc87e4b5ce2fe28308fd9f2a7baf3"
string(32) "a87ff679a2f3e71d9181a67b7542122c"
string(32) "e4da3b7fbbce2345d7772b0674a318d5"
string(32) "1679091c5a880faf6fb5e6087eb1b2dc"
string(32) "8f14e45fceea167a5a36dedd4bea2543"
string(32) "c9f0f895fb98ab9159f51fd0297e236d"
string(32) "45c48cce2e2d7fbdea1afc51c7c6ad26"
string(32) "0cc175b9c0f1b6a831c399e269772661"
string(32) "92eb5ffee6ae2fec3ad71c777531578f"
string(32) "4a8a08f09d37b73795649038408b5f33"
string(32) "8277e0910d750195b448797616e091ad"
string(32) "e1671797c52e15f763380b45e841ec32"
string(32) "8fa14cdd754f91cc6554c9e71929cce7"

If we use the same with md7($c, 32):

NULL
string(32) "4a8a08f09d37b73795649038408b5f33"
string(32) "4a8a08f09d37b73795649038408b5f33"
string(32) "e1671797c52e15f763380b45e841ec32"
string(32) "0cc175b9c0f1b6a831c399e269772661"
string(32) "e1671797c52e15f763380b45e841ec32"
string(32) "c4ca4238a0b923820dcc509a6f75849b"
string(32) "c9f0f895fb98ab9159f51fd0297e236d"
string(32) "4a8a08f09d37b73795649038408b5f33"
string(32) "a87ff679a2f3e71d9181a67b7542122c"
string(32) "cfcd208495d565ef66e7dff9f98764da"
string(32) "45c48cce2e2d7fbdea1afc51c7c6ad26"
string(32) "a87ff679a2f3e71d9181a67b7542122c"
string(32) "c9f0f895fb98ab9159f51fd0297e236d"
string(32) "e1671797c52e15f763380b45e841ec32"
string(32) "c9f0f895fb98ab9159f51fd0297e236d"

Again, 0 returns NULL. But more interesting is that the characters of {1, 2, 8}, {3, 5, 7}, and {c, f} result in the same hash value. This is because their MD5 hash value does begin the same octet. And as you take those hashes to build your final MD7 hash, the first 32 hexadecimal characters of their resulting hash are identical, too.

So for a length of ≤ 32, there are only 16 possible MD7 hashes, for ≤ 64 there are 162 MD7 hashes, and so on. The maximum length is 3·32·32=3072 and the number of possible MD7 hash values is 163·32.

But the final 3072 characters long MD7 hash doesn’t have 16^192 entropy. Because the last 1024 characters are derived from the length of the resulting MD7 hash which is already known, so there are only 2048 unknown characters left.

As the hashing loop can be reversed, one can also obtain the initial MD5 of the text and of its length.

Here’s an example:

function md7_info($hash) {
    $hashlen = strlen($hash);

    $md5_to_hex = array();
    foreach (str_split('0123456789abcdef') as $c) $md5_to_hex[md5($c)] = $c;

    $md5_text = '';
    foreach (str_split(substr($hash, 0, 1024), 32) as $h) $md5_text .= $md5_to_hex[$h];

    $md5_textlen = '';
    foreach (str_split(substr($hash, 1024, 1024), 32) as $h) $md5_textlen .= $md5_to_hex[$h];

    return array($md5_text, $md5_textlen, $hashlen);
}

The MD5 of the original text length can also be reversed. So the final remaining information is the MD5 and the text length. Not much more security gained. In fact, the knowledge of the text length does actually reveal additional information.

Upvotes: 3

Emil Vikstr&#246;m
Emil Vikstr&#246;m

Reputation: 91983

Here are a few obvious security flaws:

if($text)

Both "0" and "" returns the same hash.

foreach($split as $key=>$value)
{
  $md5 = $md5.md5($value);
}

You are hashing each individual letter of the first hash. With $len <= 32 this will make it so that you have the exact same hash for all strings where the first letter of md5($value) is the same, essentially decreasing the entropy from 128 bits down to 4. Here is a complete list of all 16 hash values with $len = 32:

8f14e45fceea167a5a36dedd4bea2543
92eb5ffee6ae2fec3ad71c777531578f
a87ff679a2f3e71d9181a67b7542122c
e4da3b7fbbce2345d7772b0674a318d5
c81e728d9d4c2f636f067f89cc14862c
8277e0910d750195b448797616e091ad
0cc175b9c0f1b6a831c399e269772661
45c48cce2e2d7fbdea1afc51c7c6ad26
4a8a08f09d37b73795649038408b5f33
e1671797c52e15f763380b45e841ec32
eccbc87e4b5ce2fe28308fd9f2a7baf3
c4ca4238a0b923820dcc509a6f75849b
8fa14cdd754f91cc6554c9e71929cce7
c9f0f895fb98ab9159f51fd0297e236d
1679091c5a880faf6fb5e6087eb1b2dc
cfcd208495d565ef66e7dff9f98764da

Note that this problem is not mitigated by choosing a $len > 32. You will still only use the second letter of the original hash, which gives you 4 bits more entropy (now up to 8 bits), which equals 256 different hashes.

I am pretty sure that you need a length of 32^32 to even match the entropy of the original md5. That is an insanely large number.


If security is what you want, go with a well-defined and well-tested hashing function. PHP have an sha1 function as well as a bunch of others in the hash function.

Hash functions are typically created and reviewed by the cryptographic community. They are much better than almost any simple hack you can come up with, so do not implement your own hashing function but use one of the available ones.

Upvotes: 4

Michael
Michael

Reputation: 2261

There is an issue here that might be worth considering, but I have no way easily analyzing md5 hashes.

Let's consider this in two parts, the first half takes the input, hashes it and its length and the length of the return. I'm not sure what the effect of an array concatenated with a string is, but I'm going to assume it's just copied for each instance.

Then you iterate through each and build up a new value: $md5 = $md5.md5($value). Given that you're returning a string of at most $len, this seems like the biggest weakness as you're returning the first $len bytes from the string.

A couple of effects: $md5 depends mostly on the beginning of the input. Try something like

md7("aaaaaaaaaaaaaaaaaaaaaaaa", 16)
md7("aaaaaaaaaaaaaaaaaaaaaaab", 16)

and see if you get similar results. I haven't tried this, but my hunch is you don't have proper mixing of the entire input like you might want.

Upvotes: 1

Related Questions