fraillt
fraillt

Reputation: 318

Why does HashMap need a cryptographically secure hashing function?

I'm reading a Rust book about HashMap hashing functions, and I can't understand these two sentences.

By default, HashMap uses a cryptographically secure hashing function that can provide resistance to Denial of Service (DoS) attacks. This is not the fastest hashing algorithm available, but the trade-off for better security that comes with the drop in performance is worth it.

I know what a cryptographically secure hash function is, but don't I understand the rationale behind it. From my understanding a good hash function for HashMap should only have three properties:

Other properties, in cryptographically secure hash function, are not really relevant 99% (maybe even 99.99%) of the time for hash tables.

So my question is: What does "resistance to DoS attack and better security " even mean in the context of HashMap?

Upvotes: 18

Views: 3754

Answers (3)

Matthieu M.
Matthieu M.

Reputation: 299989

Let's start backward: how do you DoS a HashMap?

Over the years, there have been multiple attacks on various software stacks based on Hash Flooding. If you know which framework a site is powered by, and therefore which hash function is used, and this hash function is not cryptographically secure then you may be able to pre-compute, offline, a large set of strings hashing to the same number.

Then, you simply inject this set into the site, and for each (simple) request, it does a disproportionately large amount of work as inserting N elements takes O(N2) operations.


Rust was conceived with the benefit of hindsight, and therefore attention was paid to avoiding this attack by default, reasoning that users who really need performance out of HashMap would simply switch the hash function.

Upvotes: 23

Lukas Kalbertodt
Lukas Kalbertodt

Reputation: 88856

Let's say we use HashMap to store some user data in a web-application. Suppose that users can choose (part of) the key in some way – maybe the key is a username or a filename of an uploaded file or anything like that.

If we are not using a cryptographically secure hash function, this means that the attacker could possible craft multiple inputs that all map to the same output. Of course, a hash map has to deal with collisions, because they occur naturally.

But when unnaturally many collisions occur, the hash map implementation might do strange things. For example, looking up some keys could have a runtime of O(n). Or the hash map might think that it has to grow because of all the collisions; but growing won't solve the problem, so the hash map grows until all memory is used. In either case, it's bad. Hash maps just assume that statistically, collisions rarely occur.

Of course, this is not a "stealing user data" attack -- at least not directly. But if one part of a system is weak, this makes it easier for attackers to find other weaknesses.

A cryptographically secure hash function prevents this attack, since the attacker cannot possibly craft multiple keys that map to the same value (at least not without trying out all keys).


is not really relevant 99% (maybe even 99.99%) of the time for hash tables.

Yes, probably. But this is difficult to balance. I guess we all would agree that if 20% of users would have security problems in their application due to an unsecure hash function (while 80% don't care), it's still a good idea to use the "secure by default" approach. What about 5%/95%? What about 1%/99%? Hard to tell where the threshold is, right?

There has been a ton of discussion about this already. Because yes, most people only notice the slowness of the hash map. Maybe the situation I described above is incredibly rare and it isn't worth slowing down all other users' code by default. But this has been decided, the default hash function won't change, and luckily you can choose your own hash function.

Upvotes: 6

sepp2k
sepp2k

Reputation: 370357

If a server application stores user input (such as post data in a web application) in a hash table, a malicious user may try to provide a large number of inputs that all have the same hash value, leading to a large number of hash collisions and thus slowing down operations on the map significantly, to the point that it can be used as a DoS attack (as described in this article for example).

If the hash is cryptographically secure, attackers will have a much harder time trying to find inputs with the same hash value.

Upvotes: 2

Related Questions