ErikR
ErikR

Reputation: 52039

Suggestion for a large hash table (2^25 elements)

I want to write a birthday attack program in Haskell for a variant of SHA1 which only produces only a 50 bit hash. To do this I need a hash table capable of storing approx. 2^25 entries.

The keys in this map will be Int64 and the values will be short length strings (~ 16 bytes).

Any suggestions for which hash implementation to use?

(Disregard that last update - I would need a bit array of 2^50 elements.)

Upvotes: 2

Views: 1632

Answers (2)

Edward Kmett
Edward Kmett

Reputation: 29982

For 2^25 entries at 8 bytes a piece, you are looking at something like 768MB of storage for just the data, at most probably around 3 gigabytes with actual overhead for storing bytestrings -- guesstimating 80 bytes per bytestring, then you have the hashtable/map's internals to store, and the boxing for the key, etc.

This means you can store the entire thing resident in memory on a decent machine, which keeps the problem relatively sane, but your collection times will kind of suck.

I would recommend using a lot of smaller hash tables, by partitioning your keyspace, that way you can run lots of the updates in parallel regardless of the hash table you use.

As for implementation:

You can either wrap a bunch of immutable hash tables like the wide-fanout ones from unordered-containers in IORefs and use some kind of atomicModifyIORef or something like ryan newton's compare and swap primitive, or you can try to use the old Data.HashTable implementation in a straightforward imperative manner.

The latter will improve your asymptotics by a logarithmic factor over the hash-array mapped tries used by unordered-containers, but Data.HashTable has bad constants. At the scale of your problem these factors probably cancel out though.

Upvotes: 6

Arpssss
Arpssss

Reputation: 3858

I also posted same sort of question. And from some suggestion, I am using Kyoto Cabinet. It is pretty cool and gives nice performance also. You can check my posts also because I have similar issues. EX. one, two and three. Perhaps this may helpful.

Upvotes: 2

Related Questions