Reputation: 633
I'm not sure how to correctly initialize my hash - I'm trying to create a key/value pair for values in coupled lines in my input file.
For example, my input looks like this:
@cluster t.18
46421 ../../../output###.txt/
@cluster t.34
41554 ../../../output###.txt/
I'm extracting the t number from line 1 (@cluster line) and matching it to output###.txt in the second line (line starting with 46421). However, I can't seem to get these values into my hash with the script that I have written.
#!/usr/bin/perl
use warnings;
use strict;
my $key;
my $value;
my %hash;
my $filename = 'input.txt';
open my $fh, '<', $filename or die "Can't open $filename: $!";
while (my $line = <$fh>) {
chomp $line;
if ($line =~ m/^\@cluster/) {
my @fields = split /(\d+)/, $line;
my $key = $fields[1];
}
elsif ($line =~ m/^(\d+)/) {
my @output = split /\//, $line;
my $value = $output[5];
}
$hash{$key} = $value;
}
Upvotes: 2
Views: 483
Reputation: 66954
It's a good idea, but your $key
that is created with my
in the if
block is a local variable scoped to that block, masking the global $key
. Inside the if
block the symbol $key
has nothing to do with the one you nicely declared upfront. See my in perlsub
.
This local $key
goes out of scope as soon as if
is done and does not exist outside the if
block. The global $key
is again available after the if
, being visible elsewhere in the loop, but is undefined since it has never been assigned to. The same goes for $value
in the elsif
block.
Just drop the my
declaration inside the loop, thus assign to those global variables (as intended?). So, $key = ...
and $value = ...
, and the hash will be assigned correctly.
Note -- this is about how to get that hash assignment right. I don't know how your actual data looks and whether the line is parsed correctly. Here is a toy input.txt
@cluster t.1 1111 ../../../output1.1.txt/ @cluster t.2 2222 ../../../output2.2.txt/
I pick the 4th field instead of the 6th, $value = $output[3];
, and add
print "$_ => $hash{$_}\n" for keys %hash;
after the loop. This prints
1 => output1.1.txt 2 => output2.2.txt
I am not sure whether this is what you want but the hash is built fine.
A comment on choice of tools in parsing
You parse the lines for numbers, by using the property of split
to return the separators as well, when they are captured. That is neat, but in some sense it reverses its main purpose, which is to extract other components from the string, as delimited by the pattern. Thus it may make the purpose of the code a little bit convoluted, and you also have to index very precisely to retrieve what you need.
Instead of using split
to extract the delimiter itself, which is given by a regex, why not extract it by a regex? That makes the intention crystal clear, too. For example, with input
@cluster t.10 has 4319 elements, 0 subclusters 37652 ../../../../clust/output43888.txt 1.397428
the parsing can go as
if ($line =~ m/^\@cluster/) {
($key) = $line =~ /t\.(\d+)/;
}
elsif ($line =~ m/^(\d+)/) {
($value) = $line =~ m|.*/(\w+\.txt)|;
}
$hash{$key} = $value if defined $key and defined $value;
where t\.
and \.txt
are added to more precisely specify the targets. If the target strings aren't certain to have that precise form, just capture \d+
, and in the second case all non-space after the last /
, say by m|^\d+.*/(\S+)|
. We use the greediness of .*
, which matches everything possible up to the thing that comes after it (a /
), thus all the way to the very last /
.
Then you can also reduce it to a single regex for each line, for example
if ($line =~ m/^\@cluster\s+t\.(\d+)/) {
$key = $1;
}
elsif ($line =~ m|^\d+.*/(\w+\.txt)|) {
$value = $1;
}
Note that I've added a condition to the hash assignment. The original code in fact assigns an undef
on the first iteration, since no $value
had yet been seen at that point. This is overwritten on the next iteration and we don't see it if we only print the hash afterwards. The condition also guards you against failed matches, for malformatted lines or such. Of course, far better checks can be run.
Upvotes: 6