Reputation: 1547
I'm new to using perl and I'm trying to build a hash of a hash from a tsv. My current process is to read in a file and construct a hash and then insert it into another hash.
my %hoh = ();
while (my $line = <$tsv>)
{
chomp $line;
my %hash;
my @data = split "\t", $line;
my $id;
my $iter = each_array(@columns, @data);
while(my($k, $v) = $iter->())
{
$hash{$k} = $v;
if($k eq 'Id')
{
$id = $v;
}
}
$hoh{$id} = %hash;
}
print "dump: ", Dumper(%hoh);
This outputs:
dump
$VAR1 = '1234567890';
$VAR2 = '17/32';
$VAR3 = '1234567891';
$VAR4 = '17/32';
.....
Instead of what I would expect:
dump
{
'1234567890' => {
'k1' => 'v1',
'k2' => 'v2',
'k3' => 'v3',
'k4' => 'v4',
'id' => '1234567890'
},
'1234567891' => {
'k1' => 'v1',
'k2' => 'v2',
'k3' => 'v3',
'k4' => 'v4',
'id' => '1234567891'
},
........
};
My limited understanding is that when I do $hoh{$id} = %hash;
its inserting in a reference to %hash? What am I doing wrong? Also is there a more succint way to use my columns and data array's as key,value pairs into my %hash object?
-Thanks in advance, Niru
Upvotes: 0
Views: 750
Reputation: 62089
To get a reference to a hash variable, you need to use \%hash
(as choroba said).
A more succinct way to assign values to columns is to assign to a hash slice, like this:
my %hoh = ();
while (my $line = <$tsv>)
{
chomp $line;
my %hash;
@hash{@columns} = split "\t", $line;
$hoh{$hash{Id}} = \%hash;
}
print "dump: ", Dumper(\%hoh);
A hash slice (@hash{@columns}
) means essentially the same thing as ($hash{$columns[0]}, $hash{$columns[1]}, $hash{$columns[2]}, ...)
up to however many columns you have. By assigning to it, I'm assigning the first value from split
to $hash{$columns[0]}
, the second value to $hash{$columns[1]}
, and so on. It does exactly the same thing as your while ... $iter
loop, just without the explicit loop (and it doesn't extract the $id
).
There's no need to compare each $k
to 'Id'
inside a loop; just store it in the hash as a normal field and extract it afterwards with $hash{Id}
. (Aside: Is your column header Id
or id
? You use Id
in your loop, but id
in your expected output.)
If you don't want to keep the Id
field in the individual entries, you could use delete (which removes the key from the hash and returns the value):
$hoh{delete $hash{Id}} = \%hash;
Upvotes: 2
Reputation: 3601
You should only store a reference to a variable if you do so in the last line before the variable goes go of scope. In your script, you declare %hash
inside the while loop, so placing this statement as the last in the loop is safe:
$hoh{$id} = \%hash;
If it's not the last statement (or you're not sure it's safe), create an anonymous structure to hold the contents of the variable:
$hoh{$id} = { %hash };
This makes a copy of %hash
, which is slower, but any subsequent changes to it will not effect what you stored.
Upvotes: 0
Reputation: 107030
Take a look at the documentation included in Perl. The command perldoc
is very helpful. You can also look at the Perldoc webpage too.
One of the tutorials is a tutorial on Perl references. It all help clarify a lot of your questions and explain about referencing and dereferencing.
I also recommend that you look at CPAN. This is an archive of various Perl modules that can do many various tasks. Look at Text::CSV. This module will do exactly what you want, and even though it says "CSV", it works with tab separated files too.
You missed putting a slash in front of your hash you're trying to make a reference. You have:
$hoh{$id} = %hash;
Probably want:
$hoh{$id} = \%hash;
also, when you do a Data::Dumper
of a hash, you should do it on a reference to a hash. Internally, hashes and arrays have similar structures when a Data::Dumper dump is done.
You have:
print "dump: ", Dumper(%hoh);
You should have:
print "dump: ", Dumper( \%hoh );
My attempt at the program:
#! /usr/bin/env perl
#
use warnings;
use strict;
use autodie;
use feature qw(say);
use Data::Dumper;
use constant {
FILE => "test.txt",
};
open my $fh, "<", FILE;
#
# First line with headers
#
my $line = <$fh>;
chomp $line;
my @headers = split /\t/, $line;
my %hash_of_hashes;
#
# Rest of file
#
while ( my $line = <$fh> ) {
chomp $line;
my %line_hash;
my @values = split /\t/, $line;
for my $index ( ( 0..$#values ) ) {
$line_hash{ $headers[$index] } = $values[ $index ];
}
$hash_of_hashes{ $line_hash{id} } = \%line_hash;
}
say Dumper \%hash_of_hashes;
Upvotes: 1
Reputation: 241748
To get a reference, you have to use \
:
$hoh{$id} = \%hash;
%hash
is the hash, not the reference to it. In scalar context, it returns the string X/Y
wre X is the number of used buckets and Y the number of all the buckets in the hash (i.e. nothing useful).
Upvotes: 2