Sir Cornflakes
Sir Cornflakes

Reputation: 665

Replicable iteration over a hash in perl

It is well known how to iterate over a hash in perl (see, e.g., What's the safest way to iterate through the keys of a Perl hash?). However, the order of the keys and values is undetermined and in fact different for each run of the perl script.

Is there a way to ensure that every run of the same perl script on the same input data results in the same iteration order? I only care for replicability in this sense---the order needs not to be predictable by a human being.

EDIT: I formulated the question in terms of iteration, but maybe it is not the iteration over the hash but the hash building process that is non-deterministic. Can I set some inititialisation to build the hash in a deterministic and replicable way?

Upvotes: 1

Views: 571

Answers (1)

Sobrique
Sobrique

Reputation: 53498

sort them first:

foreach my $key (  sort keys %hash ) { 

}

Note: Default sort is alphabetical, not numeric. But sort will take a custom function to allow you to sort in almost any order you care to name.

Alternatively, capture the ordering in an array and use that to extract the output order.

my %content_for;
my @ordered_id; 

while ( <$input_filehandle> ) { 
    my ( $id, $content ) = split; 
    push ( @ordered_id, $id ); 
    $content_for{$id} = $content; 
}

print join ( "\n", @content_for{@ordered_id} ),"\n"

;

Or something like an ordered hash mechanism like Hash::Ordered or Tie::IxHash.

I formulated the question in terms of iteration, but maybe it is not the iteration over the hash but the hash building process that is non-deterministic. Can I set some inititialisation to build the hash in a deterministic and replicable way?

No. Hashes don't work like that. See - perlsec for an explanation why. It got more random with newer versions of perl, but it was always an unordered data structure.

You can perhaps mess around with (as mentioned in the article) PERL_HASH_SEED and PERL_PERTURB_KEYS but this would definitely not be a good practice.

PERL_HASH_SEED=0 ./somescript.pl 

But you should bear in mind that hash ordering is still not guaranteed - the sequencing of keys may still change. It'll be a bit more consistent than before though. This is definitely not a good thing to use in production, or rely on for anything more than debugging.

PLEASE NOTE: The hash seed is sensitive information. Hashes are randomized to protect against local and remote attacks against Perl code. By manually setting a seed, this protection may be partially or completely lost.

Upvotes: 7

Related Questions