Reputation: 15802
MD5, MD6?, all the SHA-somethings, CRC-somethings. I've used them before and seen them used in various places, but I have no idea why you would use one over another.
On a very high level, what is the difference between all these 3/4 letter acronyms In terms of performance, collision probability and general hard-to-crackness? Does any of those depend on what kind or what amount of data I am hashing?
What trade-offs am I making when i choose one over another? I've read that the CRC is not suitable to use for security, but what about for general hash-table collision avoidance?
Upvotes: 2
Views: 2179
Reputation: 74382
To complete the other answers: performance varies among hash functions. Hash functions are built on elementary operations, which are more or less efficient, depending on the architecture. For instance, the SHA-3 candidate Skein uses additions on 64-bit integers and is very fast on platforms which offer 64-bit operations, but on 32-bit-only systems (including all ARM processors), Skein is much slower.
SHA-256 is usually said to be "slow" but will still hash data at a rate of about 150 megabytes per second on a basic PC (a 2.4 GHz Core2), which is more than enough for most applications. It is rare that hash function performance is really important on a PC. Things can be different on embedded systems (from smartcards to smartphones) where you could get more data to process than what the CPU can handle. MD5 will be typically 3 to 6 times faster than SHA-256. SHA-256 is still the recommended default choice, since its security is still intact; consider using something else only if you get a real, duly constated and measured performance issue.
On small 32-bit architectures (MIPS, ARM...), all remaining SHA-3 candidates are slower than SHA-256, so getting something faster and yet secure could be challenging.
Upvotes: 1
Reputation: 489998
CRC-whatever is used primarily (should be exclusively) for protection against accidental changes in data. They do quite a good job of detecting noise and such, but are not intended for cryptographic purposes -- finding a second preimage (a second input that produces the same hash) is (by cryptographic standards) trivial. [Edit: As noted by @Jon, unlike the other hashes mentioned here, CRC is not and never was intended for cryptographic use.]
MD-5. Originally intended for cryptographic use, but fairly old and now considered fairly weak. Although no second preimage attack is known, a collision attack is known (i.e., a way to produce two selected inputs that produce the same result, but not a second input to produce the same result as one that's specified). About the only time to use this any more is as a more elaborate version of a CRC.
Once upon a time, there was simply "SHA". Very early in its history, a defect was found, and a slight modification was made to produce SHA-1. SHA was in use for a short enough time that it's rarely of practical interest.
SHA-1 is generally more secure than MD-5, but still in the same general range -- a collision attack is known, though it's a lot1 more expensive than for MD-5. No second preimage attack is known, but the collision attack is enough to say "stay away".
SHA-256, SHA-384, SHA-512: These are sort of based on SHA-1, but are somewhat more complex internally. At least as far as I'm aware, neither a second-preimage attack nor a collision attack is known on any of these at the present time.
SHA-3: US National Institute of Standards and Technology (NIST) is currently holding a competition to standardize a replacement for the current SHA-2 series hash algorithm, which will apparent be called SHA-3. As I write this (September 2011) the competition is currently in its third round, with five candidates (Blake, Grøstl, JH, Kaccek and Skein2) left in the running. Round 3 is scheduled to be over in January 2012, at which time public comments on the algorithms will no longer be (at least officially) accepted. In March 2012, a (third) SHA-3 conference will be held (in Washington DC). At some unspecified date later in 2012, the final selection will be announced.
1 For anybody who cares about how much more expensive it is to attack SHA-1 than MD-5, I'll try to give some concrete numbers. For MD-5, my ~5 year-old machine can produce a collision in about 40-45 minutes. For SHA-1, I only have an estimate, but my estimate is that a cluster to produce collisions at a rate of one per week would cost well over a million US dollars (and probably closer to $10 million). Even given an existing machine, the cost of operating the machine long enough to find a collision is substantial.
2 Since it's almost inevitable that somebody will at least wonder, I'll point out that the entry Bruce Schneier worked on is Skein.
Upvotes: 5
Reputation: 101139
Here's the really short summary:
Upvotes: 1