syker
syker

Reputation: 11272

How do I sort a Perl hash that has tons of data?

I am sorting a hash in Perl. I encountered an Out of memory error when running my Perl Script:

foreach $key (sort (keys(%hash))) {
   ....
}

How do I sort a hash that has tons of data?

Upvotes: 2

Views: 2333

Answers (3)

salva
salva

Reputation: 10244

If your keys are integers, numbers or strings of a small maximum size, you can use Sort::Packed:

use Sort::Packed qw(sort_packed);

my $hash_size = keys %hash;
my $max_key_len = 4;  
my $packed_keys = '\0' x ($max_key_len * $hash_size);
my $ix = 0;
while (my ($key, $value) = each %hash) {
  my $key_len = length $k;
  $key_len <= $max_key_len or die "key $key is too big";
  substr($packed_keys, $ix, $key_len, $key);
  $ix += $max_key_len;
}

sort_packed("C$max_key_len", $packed_keys);

$ix = 0;
while ($ix < length $packed_keys) {
  my $key = substr($packed_keys, $ix, $max_key_len);
  $key =~ s/\0+$//;
  print "$key\n";
  $ix += $max_key_len;
}

Admittedly, this code is quite ugly, but it will keep memory usage to the minimum.

Upvotes: 0

Schwern
Schwern

Reputation: 164829

sort keys %hash is inefficient for a large %hash in that, memory wise, its roughly equivalent to:

my @keys = keys %hash;
@keys = sort @keys;

In that it has to keep three copies of the keys in memory while doing the sorting (one in the hash, one in the list of keys, one in the sorted list being created). foreach's memory optimizations for iterators do not apply.

Since the hash is so large, the best option is to get it entirely out of memory. Stick it in a BerkeleyDB file. And if you want to keep the keys in order a hash isn't the best option, a tree is. I'd suggest using a Berkeley BTree file. Trees will efficiently keep your data sorted like an array while providing fast lookup like a hash.

Here's an example using BerkeleyDB. DB_File is simpler and better documented but does not take advantage of modern features of BerkeleyDB. YMMV.

use BerkeleyDB;

my $db  = tie my %hash, 'BerkeleyDB::Btree',
              -Filename => "your.db",
              -Compare  => sub { $_[1] cmp $_[0] },
              -Flags    => DB_CREATE;

-Compare illustrates how to supply your own sorting function. The tied interface will be sluggish. Unless you need it to act like a hash, use the object interface.

Upvotes: 13

Space
Space

Reputation: 7259

Perl FAQ has some examples to sort a hash. Look at How do I sort a hash? and here is A Fresh Look at Efficient Perl Sorting.

Upvotes: 0

Related Questions