Karel Bílek
Karel Bílek

Reputation: 37658

What is better in Perl: an array of hash references or list of "flat" hashes?

I cannot decide which approach is more (1) idiomatic Perl, (2) efficient, or (3) "clear".

Let me explain by code. First, I can do

sub something {
  ...
  $ref->{size}   = 10;
  $ref->{name}   = "Foo";
  $ref->{volume} = 100;
  push (@references, $ref);
  ...
  return @references;
}

or, I can do

sub something {
  ...
  push (@names, "Foo");
  $sizes{Foo}   =  10;
  $volumes{Foo} = 100;
  ...
  return (\@names, \%sizes, \%volumes);
}

Both do essentially the same thing. The important thing is, I need the array, because I need to keep the order.

I know, there is always more than one way to do something, but still, which one of these two would you prefer?

Upvotes: 4

Views: 657

Answers (4)

brian d foy
brian d foy

Reputation: 132832

Both ways might be useful for different problems. If you are always going to access all of the information together, just keep it together. For instance, in your case you want to track the name, title, and size of a web page. You're probably working with all three of those things at the same time, so keep them together as an array of hash references.

Other times, you might break data into different things that you use separately and want to look up independently of the other columns. In those cases, separate hashes might make sense.

Upvotes: 0

daotoad
daotoad

Reputation: 27183

Keep your related data together. The only reason to create big parallel arrays is because you are forced to.

If you are concerned about speed and memory usage, you can use constant array indexes to access your named fields:

use constant { SIZE => 0, NAME => 1, VOLUME => 2, };

sub something {
  ...

  $ref->[SIZE]   = 10;
  $ref->[NAME]   = "Foo";
  $ref->[VOLUME] = 100;

  push @references, $ref;

  ...
  return @references;
}

I've also added some whitespace to make the code easier to read.

If I have a lot of parameters with validation rules and/or deep data structures, I tend to look to objects to simplify my code by tying the logic about the data, to the data. Of course, OOP exacts a speed penalty, but I have only rarely seen that become a problem.

For quick and dirty OOP, I use Class::Struct, which has many flaws. For situations where I need type checking, I use Moose or Mouse (when memory or startup speed is a big concern).

Upvotes: 2

Sinan Ünür
Sinan Ünür

Reputation: 118148

Instead of thinking in meaningless terms such as something, think and phrase the issue in concrete terms. In this case, you seem to be returning a list of objects that have name, size and volume attributes. When you think of it that way, there is no reason to even consider the second method.

You can think of optimizations later if you run into problems, but even if you do, you would probably gain more from Memoize than by exploding data structures.

One efficiency improvement I will recommend is to return a reference from this subroutine:

sub get_objects {
    my @ret;

    while ( 'some condition' ) {
        #  should I return this one?
        push @ret, {
            name => 'Foo',
            size => 10,
            volume => 100,
        };
    }

    return \@ret;
}

Upvotes: 10

Thomas
Thomas

Reputation: 181805

I vastly prefer the former. It keeps one "packet" of data (size, name, volume) together and makes for much more readable code.

Upvotes: 5

Related Questions