fhucho
fhucho

Reputation: 34530

Fast hash function with unique hashes

I'm writing a disk cache where filenames are the keys. The keys can be longer than the max filename length, so they need to be hashed. What are some fast hash functions with extremely low probability of collisions (so that I can ignore it)?

Basically, I'm looking for a faster alternative to MD5 with no security requirenments.

(Platform = Android, language = Java.)

Upvotes: 1

Views: 3497

Answers (2)

Christian
Christian

Reputation: 3721

The google guava library has different fast hash implementations:

http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/hash/Hashing.html#murmur3_128%28%29

Upvotes: 0

andrew cooke
andrew cooke

Reputation: 46862

if your hash is uniformly distributed then you can calculate the size of the hash (in bits) that you need from the approx number of files you expect to handle before a collision. basically, because of the birthday paradox, it's twice the number of bits.

so, for example, if you are happy with a collision after a million files then you need a has that is about 40 bits log (2 * log2(1e6)).

conversely, if a hash is N bits, then it's good for 2^(N/2) files without collision (more or less).

there are many fast hashes. for example, xxhash is a 64 bit hash, so is good for about 4,000,000,000 files. google's fast-hash is another.

if you want more than 64bits (more than ~4 billion files before a collision) then you can either use a hash with a larger output or join two 64bit hashes together (one hash from the original file and one with it modified in some way (eg prefixed with a space)).

Upvotes: 6

Related Questions