Li Haoyi
Li Haoyi

Reputation: 15802

characteristics of various hash algorithms?

MD5, MD6?, all the SHA-somethings, CRC-somethings. I've used them before and seen them used in various places, but I have no idea why you would use one over another.

On a very high level, what is the difference between all these 3/4 letter acronyms In terms of performance, collision probability and general hard-to-crackness? Does any of those depend on what kind or what amount of data I am hashing?

What trade-offs am I making when i choose one over another? I've read that the CRC is not suitable to use for security, but what about for general hash-table collision avoidance?

Upvotes: 2

Views: 2179

Answers (3)

Thomas Pornin
Thomas Pornin

Reputation: 74382

To complete the other answers: performance varies among hash functions. Hash functions are built on elementary operations, which are more or less efficient, depending on the architecture. For instance, the SHA-3 candidate Skein uses additions on 64-bit integers and is very fast on platforms which offer 64-bit operations, but on 32-bit-only systems (including all ARM processors), Skein is much slower.

SHA-256 is usually said to be "slow" but will still hash data at a rate of about 150 megabytes per second on a basic PC (a 2.4 GHz Core2), which is more than enough for most applications. It is rare that hash function performance is really important on a PC. Things can be different on embedded systems (from smartcards to smartphones) where you could get more data to process than what the CPU can handle. MD5 will be typically 3 to 6 times faster than SHA-256. SHA-256 is still the recommended default choice, since its security is still intact; consider using something else only if you get a real, duly constated and measured performance issue.

On small 32-bit architectures (MIPS, ARM...), all remaining SHA-3 candidates are slower than SHA-256, so getting something faster and yet secure could be challenging.

Upvotes: 1

Jerry Coffin
Jerry Coffin

Reputation: 489998

CRC-whatever is used primarily (should be exclusively) for protection against accidental changes in data. They do quite a good job of detecting noise and such, but are not intended for cryptographic purposes -- finding a second preimage (a second input that produces the same hash) is (by cryptographic standards) trivial. [Edit: As noted by @Jon, unlike the other hashes mentioned here, CRC is not and never was intended for cryptographic use.]

MD-5. Originally intended for cryptographic use, but fairly old and now considered fairly weak. Although no second preimage attack is known, a collision attack is known (i.e., a way to produce two selected inputs that produce the same result, but not a second input to produce the same result as one that's specified). About the only time to use this any more is as a more elaborate version of a CRC.

SHA-x

Once upon a time, there was simply "SHA". Very early in its history, a defect was found, and a slight modification was made to produce SHA-1. SHA was in use for a short enough time that it's rarely of practical interest.

SHA-1 is generally more secure than MD-5, but still in the same general range -- a collision attack is known, though it's a lot1 more expensive than for MD-5. No second preimage attack is known, but the collision attack is enough to say "stay away".

SHA-256, SHA-384, SHA-512: These are sort of based on SHA-1, but are somewhat more complex internally. At least as far as I'm aware, neither a second-preimage attack nor a collision attack is known on any of these at the present time.

SHA-3: US National Institute of Standards and Technology (NIST) is currently holding a competition to standardize a replacement for the current SHA-2 series hash algorithm, which will apparent be called SHA-3. As I write this (September 2011) the competition is currently in its third round, with five candidates (Blake, Grøstl, JH, Kaccek and Skein2) left in the running. Round 3 is scheduled to be over in January 2012, at which time public comments on the algorithms will no longer be (at least officially) accepted. In March 2012, a (third) SHA-3 conference will be held (in Washington DC). At some unspecified date later in 2012, the final selection will be announced.


1 For anybody who cares about how much more expensive it is to attack SHA-1 than MD-5, I'll try to give some concrete numbers. For MD-5, my ~5 year-old machine can produce a collision in about 40-45 minutes. For SHA-1, I only have an estimate, but my estimate is that a cluster to produce collisions at a rate of one per week would cost well over a million US dollars (and probably closer to $10 million). Even given an existing machine, the cost of operating the machine long enough to find a collision is substantial.

2 Since it's almost inevitable that somebody will at least wonder, I'll point out that the entry Bruce Schneier worked on is Skein.

Upvotes: 5

Nick Johnson
Nick Johnson

Reputation: 101139

Here's the really short summary:

  1. MD4, MD5, SHA-1 and SHA-2 all belong to a category of general purpose secure hashing algorithms. They're generally interchangeable, in that they all produce a hashcode from a plaintext such that it's designed to be computationally infeasible to determine a plaintext that generates a hash (a 'preimage'), or to find two texts that hash to the same thing (a 'collision'). All of them are broken to various degrees, or at least believed to be vulnerable.
  2. MD6 was a candidate for NIST's SHA-3 competition, but it was withdrawn. It has the same general characteristics of the above hash functions, but like many of the SHA-3 candidates, adds extra features - in this case a merkle-tree like structure for improving parallelization of hashes. It goes without saying that it, along with the remaining SHA-3 candidates, are not yet well tested.
  3. A CRC is in fact not a hash algorithm at all. Its name stands for Cyclic Redundancy Check, and it's a checksum rather than a hash. Different CRCs are designed to resist different levels of transmission errors, but they all have in common a guarantee that they will detect a certain number of bit errors, something hash algorithms do not share. They're not as well distributed as a hash algorithm, so shouldn't be used as one.
  4. There are a range of general purpose hash algorithms suitable for use in hashtables etcetera, such as FNV. These tend to be a lot faster than cryptographic hashes, but aren't designed to resist attacks by an adversary. Unlike secure hashes, some of them show quite poor distribution characteristics, and are only suitable for hashing certain types of data.

Upvotes: 1

Related Questions