Zuiq Pazu
Zuiq Pazu

Reputation: 295

Hashing and 'brute-force' permutations

So this is a two-part question:

  1. Are there any hashing functions that guarantee that for any combination of the same length, they generate a unique hash? As I remember - most are that way, but I just need to confirm this.

  2. Based on the 1st question - so, given a file hash and a length - is it then theoretically possible to 'brute-force' all byte permutations of that same length until the same hash is generated - ie. the original file has been recreated?

PS. I am aware that this will take ages (if theoretically possible), but I think it would be feasible for small files (sizes < 1KB)

Upvotes: 2

Views: 378

Answers (2)

Patrick M
Patrick M

Reputation: 10989

Expanding upon the response to your first point, one of the points of cryptographic hash functions is unpredictability. A function with zero collisions is a 1-1 (or one-to-one) function, so called because every input has exactly one output and every output has exactly one input.

In order for a function to accept arbitrary length & complexity inputs without generating a collision, it is easy to see that the function must have arbitrary length outputs. As Gray obliquely points out, most hash functions have fixed-length outputs. (There are apparently some new algorithms that support arbitrary length outputs, but they still don't guarantee 0 collisions.) The reason is not stated clearly in the common crypto literature, but consider the difference between hashing and encrypting.

  • In hashing, you have the message (the unaltered original) and the message digest (the output of the hash function. (Digest here having the meaning "a summation or condensation of a body of information.")
  • With encryption, you have the plain text and the cipher text. The implication is that the cipher text is of equal length and complexity as the original.

I look at it as a cryptographic hash function with 0 collisions is of equal complexity as encryption. (Note that I'm unsure of what the advantages of a variable-length hash output are, so I asked a question about it.)

Additionally, hash functions are susceptible to attacks by pre-computed rainbow tables, which is why all hash algorithms still considered secure employ extra random inputs, called salts. The reason encryption isn't susceptible to a similar attack is that the encryption key is kept secret and you can't pre-compute output values without knowing the key. Compare symmetric key encryption (where there is one key that must be kept secret) with public key encryption (where the encryption key is public and the decryption key is private).

The other thing that prevents encryption algorithms from pre-computation attacks is that the number of computations for arbitrary-length inputs grows exponentially, and it is literally impossible to store the output from every input you may be interested in.

Upvotes: 0

Gray
Gray

Reputation: 7140

1KB, that'd be 1000^256, right? 1000 possible combinations of bytes (256 configurations each?). It's a real big number. 1 with 768 0s behind it.

If you were to generate all of them, one would be the right one, but you'd have some number of collisions.

According to this security.SE post, the collission rate for md5 (for example) is about 1 in 2^64. So, if we divide our original number by that, we'd get how many possible combinations, right? http://www.wolframalpha.com/input/?i=1000%5E256+%2F+2%5E64

~5.42 × 10^748

That is still a lot of files to check.

I'd feel a lot better if someone critiqued my math here, but the point is that your first point is not true because of collisions. You can use the same sort math for calculating two 1000 character passwords having the same hash. It's the birthday problem. Given 2 people, it is unlikely that we'd have the same birthday, but if you take a room full the probability of any two people having the same birthday increases very quickly. If you take all 1000 character passwords, some of them are going to collide. You are going from X bytes to 16 bytes. You can't fit all of the combinations into 16 bytes.

Upvotes: 2

Related Questions