chicane
chicane

Reputation: 2071

General Question about MD5

So im just playing around with PHP and the MD5 functionality, sorry if this sounds really silly, but I cant seem to understand, how is it possible to represent an unlimited number of characters of input into a 32 bit character output? Is my logic sound here? Or is there a limit to the input that a MD5 function can take?

Thanks...

Upvotes: 1

Views: 324

Answers (7)

miku
miku

Reputation: 187994

  • Analogy: Fingerprints.

  • How is it possible? Hash functions in general rely on the presence of certain properties ...

  • Is there a limit? Learn about md5 collision ...

Upvotes: 1

symcbean
symcbean

Reputation: 48357

Its not only possible but is an unavoidable fact that there many messages which will result in the same hash. These are usually refered to as collisions. But its very VERY hard to find them. A hash is just a function which generates a result which is effectively impossible to predict without knowing the original input.

Note that while some people (even trying to answer you're question) think that md5 is insecure the reality is that it is still more than adequate for most purposes, although I'd recommend one of the more recent hashes if you run paypal or control the launchpads for a fleet of nuclear weapons.

(and before anyone starts flaming me with silly replies, tell me what I hashed to get: b958cf404456ceb1302015102ec57a64 )

C.

Upvotes: 0

Reece45
Reece45

Reputation: 2779

It's possible to have a Collision on any hashing algorithm. You simply can't represent all of the information in the amount of space that it uses. Otherwise we'd all be using hashing algorithms instead of compression algorithms.

The chances of hitting a collision are very small. For things like passwords, they contents are usually very small. Collisions with the same hash will likely be much larger, as well as gibberish. With an ISO, the collision file might not even be bootable. An archive file probably will be unextractable.

MD5 has several ways for people to find collisions for a given hash. I'm sure other hashing algorithms do too. I believe md5 has some collision problems where you can change a small amount with no hash-change, which is why a lot of people don't recommend using it.

Some places also store the file-length (or content-length). That helps a bit with preventing collision attacks.

Upvotes: 1

Otávio Décio
Otávio Décio

Reputation: 74250

MD5 does not have the purpose of being unique, rather it can tell you if a certain bit stream (file for example) was not corrupted either by transmission or on purpose. It is very unlikely that someone wanting to change a file in any way will be able to come up with the same MD5 value, so that's why it is used by download sites to make sure you are getting the correct file.

Upvotes: 0

John Weldon
John Weldon

Reputation: 40739

I think you may be confusing an MD5 'hash' with compression or encryption.

A hash code is just a product of a process that goes through data and generates data that is likely to be unique for the given input. MD5 hashes don't contain all the data, just a probably unique representation of a 'thumbprint' of the data.

Upvotes: 2

Matthew Flaschen
Matthew Flaschen

Reputation: 284786

It's not. Like all hash functions, there are collisions, but they're supposed to be unpredictable and useless to attackers. However, MD5 is throughly compromised. A group successfully used a MD5 collision to create a unapproved certificate authority. Someone will note that there have been no preimage attacks in the wild, but I think it's time to bail on MD5.

Upvotes: 4

Pascal MARTIN
Pascal MARTIN

Reputation: 400902

A md5 is not representing the whole content : it's only... well, how to say that using non-technical terms ? Let's say a md5 is some kind of short-summary of your content.

A given content will always get you the same md5 ; and a single bit of difference in the content will almost always get you a very different md5 -- so md5 (or other hashing algorithms) is often used as a way to check that a file has not been corrupted (during a transfer, for example).

But, if you have a md5, there is no way to get the content back : you cannot re-generate a content from its summary.

Upvotes: 3

Related Questions