Reputation: 665
It is well known how to iterate over a hash in perl (see, e.g., What's the safest way to iterate through the keys of a Perl hash?). However, the order of the keys and values is undetermined and in fact different for each run of the perl script.
Is there a way to ensure that every run of the same perl script on the same input data results in the same iteration order? I only care for replicability in this sense---the order needs not to be predictable by a human being.
EDIT: I formulated the question in terms of iteration, but maybe it is not the iteration over the hash but the hash building process that is non-deterministic. Can I set some inititialisation to build the hash in a deterministic and replicable way?
Upvotes: 1
Views: 571
Reputation: 53498
sort
them first:
foreach my $key ( sort keys %hash ) {
}
Note: Default sort is alphabetical, not numeric. But sort
will take a custom function to allow you to sort in almost any order you care to name.
Alternatively, capture the ordering in an array and use that to extract the output order.
my %content_for;
my @ordered_id;
while ( <$input_filehandle> ) {
my ( $id, $content ) = split;
push ( @ordered_id, $id );
$content_for{$id} = $content;
}
print join ( "\n", @content_for{@ordered_id} ),"\n"
;
Or something like an ordered hash mechanism like Hash::Ordered
or Tie::IxHash
.
I formulated the question in terms of iteration, but maybe it is not the iteration over the hash but the hash building process that is non-deterministic. Can I set some inititialisation to build the hash in a deterministic and replicable way?
No. Hashes don't work like that. See - perlsec
for an explanation why. It got more random with newer versions of perl, but it was always an unordered data structure.
You can perhaps mess around with (as mentioned in the article) PERL_HASH_SEED and PERL_PERTURB_KEYS but this would definitely not be a good practice.
PERL_HASH_SEED=0 ./somescript.pl
But you should bear in mind that hash ordering is still not guaranteed - the sequencing of keys may still change. It'll be a bit more consistent than before though. This is definitely not a good thing to use in production, or rely on for anything more than debugging.
PLEASE NOTE: The hash seed is sensitive information. Hashes are randomized to protect against local and remote attacks against Perl code. By manually setting a seed, this protection may be partially or completely lost.
Upvotes: 7