llihttocs
llihttocs

Reputation: 2011

Adding hash keys

I am adding data to a hash using an incrementing numeric key starting at 0. The key/value is fine. When I add the second one, the first key/value pair points back to the second. Each addition after that replaces the value of the second key and then points back to it. The Dumper output would be something like this.

$VAR1 = { '0' => { ... } }; 

After the first key/value is added. After the second one is added I get

$VAR1= { '1' => { ... }, '0' => $VAR1->{'1} }; 

After the third key/value is added, it looks like this.

$VAR1 = { '1' => { ... }, '0' => $VAR1->{'1'}, '2' => $VAR1->{'1'} };  

My question is why is it doing this? I want each key/value to show up in the hash. When I iterate through the hash I get the same data for every key/value. How do I get rid of the reference pointers to the second added key?

Upvotes: 2

Views: 1126

Answers (4)

Brad Gilbert
Brad Gilbert

Reputation: 34130

The reason it is doing this is you are giving it the same reference to the same hash.
Presumably in a loop construct.

Here is a simple program which has this behaviour.

use strict;
use warnings;
# always use the above two lines until you
# understand completely why they are recommended

use Data::Printer;

my %hash;
my %inner; # <-- wrong place to put it

for my $index (0..5){
  $inner{int rand} = $index; # <- doesn't matter

  $hash{$index} = \%inner;
}

p %hash;

To fix it just make sure that you are creating a fresh hash reference every time through the loop.

use strict;
use warnings;
use Data::Printer;

my %hash;

for my $index (0..5){
  my %inner; # <-- place the declaration here instead

  $inner{int rand} = $index; # <- doesn't matter

  $hash{$index} = \%inner;
}

p %hash;

If you are only going to use numbers for your indexes, and they are monotonically increasing starting from 0, then I would recommend using an array.
An array would be faster and more memory efficient.

use strict;
use warnings;
use Data::Printer;

my @array; # <--

for my $index (0..5){
  my %inner;
  $inner{int rand} = $index;

  $array[$index] = \%inner; # <--
}

p @array;

Upvotes: 0

ikegami
ikegami

Reputation: 386501

You are setting the value of every element to a reference to the same hash. Data::Dumper is merely reflecting that.

If you're using Data::Dumper as a serializing tool (yuck!), then you should set $Data::Dumper::Purity to 1 to get something eval can process.

use Data::Dumper qw( Dumper );

my %h2 = (a=>5,b=>6,c=>7);

my %h;
$h{0} = \%h2;
$h{1} = \%h2;
$h{2} = \%h2;

print("$h{0}{c} $h{2}{c}\n");
$h{0}{c} = 9;
print("$h{0}{c} $h{2}{c}\n");

{
   local $Data::Dumper::Purity = 1;
   print(Dumper(\%h));
}

Output:

7 7
9 9
$VAR1 = {
          '0' => {
                   'c' => 9,
                   'a' => 5,
                   'b' => 6
                 },
          '1' => {},
          '2' => {}
        };
$VAR1->{'0'} = $VAR1->{'1'};
$VAR1->{'2'} = $VAR1->{'1'};

If, on the other hand, you didn't mean to use store references to different hashes, you could use

# Shallow copies
$h{0} = { %h2 };  # { ... }   means   do { my %anon = ( ... ); \%anon }
$h{1} = { %h2 };
$h{2} = { %h2 };

or

# Deep copies
use Storable qw( dclone );
$h{0} = dclone(\%h2);
$h{1} = dclone(\%h2);
$h{2} = dclone(\%h2);

Output:

7 7
9 7
$VAR1 = {
          '0' => {
                   'a' => 5,
                   'b' => 6,
                   'c' => 9
                 },
          '1' => {
                   'a' => 5,
                   'b' => 6,
                   'c' => 7
                 },
          '2' => {
                   'a' => 5,
                   'b' => 6,
                   'c' => 7
                 }
        };

Upvotes: 4

Ilmari Karonen
Ilmari Karonen

Reputation: 50368

You haven't posted the actual code you're using to build the hash, but I assume it looks something like this:

foreach my $i (1 .. 3) {
    %hash2 = (number => $i, foo => "bar", baz => "whatever");
    $hash1{$i} = \%hash2;
}

(Actually, I'll guess that, in your actual code, you're probably reading data from a file in a while (<>) loop and assigning values to %hash2 based on it, but the foreach loop will do for demonstration purposes.)

If you run the code above and dump the resulting %hash1 using Data::Dumper, you'll get the output:

$VAR1 = {
          '1' => {
                   'baz' => 'whatever',
                   'number' => 3,
                   'foo' => 'bar'
                 },
          '3' => $VAR1->{'1'},
          '2' => $VAR1->{'1'}
        };

Why does it look like that? Well, it's because the values in %hash1 are all references pointing to the same hash, namely %hash2. When you assign new values to %hash2 in your loop, those values will overwrite the old values in %hash2, but it will still be the same hash. Data::Dumper is just highlighting that fact.

So, how can you fix it? Well, there are (at least) two ways. One way is to replace \%hash2, which gives a reference to %hash2, with { %hash2 }, which copies the contents of %hash2 into a new anonymous hash and returns a reference to that:

foreach my $i (1 .. 3) {
    %hash2 = (number => $i, foo => "bar", baz => "whatever");
    $hash1{$i} = { %hash2 };
}

The other (IMO preferable) way is to declare %hash2 as a (lexically scoped) local variable within the loop using my:

foreach my $i (1 .. 3) {
    my %hash2 = (number => $i, foo => "bar", baz => "whatever");
    $hash1{$i} = \%hash2;
}

This way, each iteration of the loop will create a new, different hash named %hash2, while the hashes created on previous iterations will continue to exist (since they're referenced from %hash1) independently.

By the way, you wouldn't have had this problem in the first place if you'd followed standard Perl best practices, specifically:

  • Always use strict; (and use warnings;). This would've forced you to declare %hash2 with my (although it wouldn't have forced you to do so inside the loop).

  • Always declare local variables in the smallest possible scope. In this case, since %hash2 is only used within the loop, you should've declared it inside the loop, like above.

Following these best practices, the example code above would look like this:

use strict;
use warnings;
use Data::Dumper qw(Dumper);

my %hash1;
foreach my $i (1 .. 3) {
    my %hash2 = (number => $i, foo => "bar", baz => "whatever");
    $hash1{$i} = \%hash2;
}

print Dumper(\%hash1);

which, as expected, will print:

$VAR1 = {
          '1' => {
                   'baz' => 'whatever',
                   'number' => 1,
                   'foo' => 'bar'
                 },
          '3' => {
                   'baz' => 'whatever',
                   'number' => 3,
                   'foo' => 'bar'
                 },
          '2' => {
                   'baz' => 'whatever',
                   'number' => 2,
                   'foo' => 'bar'
                 }
        };

Upvotes: 2

David W.
David W.

Reputation: 107080

It's hard to see what the problem is when you don't post the code or the actual results of Data::Dumper.

There is one thing you should know about Data::Dumper: When you dump an array or (especially) a hash, you should dump a reference to it. Otherwise, Data::Dumper will treat it like a series of variables. Also notice that hashes do not remain in the order you create them. I've enclosed an example below. Make sure that your issue isn't related to a confusing Data::Dumper output.

Another question: If you're keying your hash by sequential keys, would you be better off with an array?

If you can, please edit your question to post your code and the ACTUAL results.

use strict;
use warnings;
use autodie;
use feature qw(say);
use Data::Dumper;

my @array = qw(one two three four five);

my %hash = (one => 1, two => 2, three => 3, four => 4);

say "Dumped Array: " . Dumper @array;
say "Dumped Hash: " . Dumper %hash;
say "Dumped Array Reference: " . Dumper \@array;
say "Dumped Hash Reference: " . Dumper \%hash;

The output:

Dumped Array: $VAR1 = 'one';
$VAR2 = 'two';
$VAR3 = 'three';
$VAR4 = 'four';
$VAR5 = 'five';

Dumped Hash: $VAR1 = 'three';
$VAR2 = 3;
$VAR3 = 'one';
$VAR4 = 1;
$VAR5 = 'two';
$VAR6 = 2;
$VAR7 = 'four';
$VAR8 = 4;

Dumped Array Reference: $VAR1 = [
          'one',
          'two',
          'three',
          'four',
          'five'
        ];

Dumped Hash Reference: $VAR1 = {
          'three' => 3,
          'one' => 1,
          'two' => 2,
          'four' => 4
        };

Upvotes: 0

Related Questions