fugu
fugu

Reputation: 6578

How are hash keys sorted?

Until today, I thought that hash keys were randomised when returned. However, the entry in perldoc on keys suggests otherwise:

Hash entries are returned in an apparently random order. The actual random order is specific to a given hash; the exact same series of operations on two hashes may result in a different order for each hash.

There are other entries on SO that relate to this - most pertinent is the accepted answer to this question.

The following code returns apparently randomised hash keys:

 my %hash;
 $hash{$_}++ for 1 .. 100; 

 say for keys %hash;

Can anyone help me understand how this is not actually random?

Upvotes: 1

Views: 194

Answers (1)

Sinan Ünür
Sinan Ünür

Reputation: 118156

How the hash keys are sorted depends on the version of perl you are using. You should not depend on any seeming order associated with any version. That includes assuming somehow that any randomization involved would be appropriate to be used in other circumstances where the statistical qualities of randomness actually matters.

From the 5.18 Hash Overhaul:

Hash overhaul

Changes to the implementation of hashes in perl v5.18.0 will be one of the most visible changes to the behavior of existing code.

By default, two distinct hash variables with identical keys and values may now provide their contents in a different order where it was previously identical.

When encountering these changes, the key to cleaning up from them is to accept that hashes are unordered collections and to act accordingly.

Hash randomization

The seed used by Perl's hash function is now random. This means that the order which keys/values will be returned from functions like keys(), values(), and each() will differ from run to run.

This change was introduced to make Perl's hashes more robust to algorithmic complexity attacks, and also because we discovered that it exposes hash ordering dependency bugs and makes them easier to track down.

Toolchain maintainers might want to invest in additional infrastructure to test for things like this. Running tests several times in a row and then comparing results will make it easier to spot hash order dependencies in code. Authors are strongly encouraged not to expose the key order of Perl's hashes to insecure audiences.

Further, every hash has its own iteration order, which should make it much more difficult to determine what the current hash seed is.

Upvotes: 4

Related Questions