Niru
Niru

Reputation: 1547

Build hash of hash in perl

I'm new to using perl and I'm trying to build a hash of a hash from a tsv. My current process is to read in a file and construct a hash and then insert it into another hash.

   my %hoh = ();
   while (my $line = <$tsv>) 
   {
      chomp $line;
      my %hash;
      my @data = split "\t", $line;

      my $id;
      my $iter = each_array(@columns, @data);

      while(my($k, $v) = $iter->())
      {
         $hash{$k} = $v;
         if($k eq 'Id')
         {
            $id = $v;   
         }
      }

      $hoh{$id} = %hash;
   }
   print "dump: ", Dumper(%hoh);

This outputs:

dump
$VAR1 = '1234567890';
$VAR2 = '17/32';
$VAR3 = '1234567891';
$VAR4 = '17/32';
.....

Instead of what I would expect:

dump
{
   '1234567890' => { 
                    'k1' => 'v1',
                    'k2' => 'v2',
                    'k3' => 'v3',
                    'k4' => 'v4',
                    'id' => '1234567890'
                   },
   '1234567891' => { 
                    'k1' => 'v1',
                    'k2' => 'v2',
                    'k3' => 'v3',
                    'k4' => 'v4',
                    'id' => '1234567891'
                   },
     ........
};

My limited understanding is that when I do $hoh{$id} = %hash; its inserting in a reference to %hash? What am I doing wrong? Also is there a more succint way to use my columns and data array's as key,value pairs into my %hash object?

-Thanks in advance, Niru

Upvotes: 0

Views: 750

Answers (4)

cjm
cjm

Reputation: 62089

To get a reference to a hash variable, you need to use \%hash (as choroba said).

A more succinct way to assign values to columns is to assign to a hash slice, like this:

my %hoh = ();
while (my $line = <$tsv>) 
{
   chomp $line;
   my %hash;
   @hash{@columns} = split "\t", $line;
   $hoh{$hash{Id}} = \%hash;
}
print "dump: ", Dumper(\%hoh);

A hash slice (@hash{@columns}) means essentially the same thing as ($hash{$columns[0]}, $hash{$columns[1]}, $hash{$columns[2]}, ...) up to however many columns you have. By assigning to it, I'm assigning the first value from split to $hash{$columns[0]}, the second value to $hash{$columns[1]}, and so on. It does exactly the same thing as your while ... $iter loop, just without the explicit loop (and it doesn't extract the $id).

There's no need to compare each $k to 'Id' inside a loop; just store it in the hash as a normal field and extract it afterwards with $hash{Id}. (Aside: Is your column header Id or id? You use Id in your loop, but id in your expected output.)

If you don't want to keep the Id field in the individual entries, you could use delete (which removes the key from the hash and returns the value):

$hoh{delete $hash{Id}} = \%hash;

Upvotes: 2

shawnhcorey
shawnhcorey

Reputation: 3601

You should only store a reference to a variable if you do so in the last line before the variable goes go of scope. In your script, you declare %hash inside the while loop, so placing this statement as the last in the loop is safe:

$hoh{$id} = \%hash;

If it's not the last statement (or you're not sure it's safe), create an anonymous structure to hold the contents of the variable:

$hoh{$id} = { %hash };

This makes a copy of %hash, which is slower, but any subsequent changes to it will not effect what you stored.

Upvotes: 0

David W.
David W.

Reputation: 107030

Take a look at the documentation included in Perl. The command perldoc is very helpful. You can also look at the Perldoc webpage too.

One of the tutorials is a tutorial on Perl references. It all help clarify a lot of your questions and explain about referencing and dereferencing.

I also recommend that you look at CPAN. This is an archive of various Perl modules that can do many various tasks. Look at Text::CSV. This module will do exactly what you want, and even though it says "CSV", it works with tab separated files too.

You missed putting a slash in front of your hash you're trying to make a reference. You have:

$hoh{$id} = %hash;

Probably want:

$hoh{$id} = \%hash;

also, when you do a Data::Dumper of a hash, you should do it on a reference to a hash. Internally, hashes and arrays have similar structures when a Data::Dumper dump is done.

You have:

 print "dump: ", Dumper(%hoh);

You should have:

 print "dump: ", Dumper( \%hoh );

My attempt at the program:

#! /usr/bin/env perl
#
use warnings;
use strict;
use autodie;
use feature qw(say);
use Data::Dumper;

use constant {
    FILE    => "test.txt",
};

open my $fh, "<", FILE;

#
# First line with headers
#

my $line = <$fh>;
chomp $line;
my @headers = split /\t/, $line;
my %hash_of_hashes;

#
# Rest of file
#
while ( my $line = <$fh> ) {
    chomp $line;
    my %line_hash;
    my @values = split /\t/, $line;
    for my $index ( ( 0..$#values ) ) {
        $line_hash{ $headers[$index] } = $values[ $index ];
    }
    $hash_of_hashes{ $line_hash{id} } = \%line_hash;
}

say Dumper \%hash_of_hashes;

Upvotes: 1

choroba
choroba

Reputation: 241748

To get a reference, you have to use \:

$hoh{$id} = \%hash;

%hash is the hash, not the reference to it. In scalar context, it returns the string X/Y wre X is the number of used buckets and Y the number of all the buckets in the hash (i.e. nothing useful).

Upvotes: 2

Related Questions