Reputation: 73
I'm not very expert in perl language but I encountered a problem that I couldn't fix, even after a long research on the web. Briefly, I have an hash of hashes like this:
my %HoH = (
chr1 => { start => 30, end => 55, },
chr1 => { start => 18, end => 21, },
chr1 => { start => 30, end => 80, }
);
I simply would like to find a way to filter it ( I mean, obtaining a new hash of hashes in output) for particular values. In particular, given an interval, let's say 40-60, I want a new hash of hashes with only elements overlapping this interval.
in other words I would like to get as output:
my %HoH = (
chr1 => { start => 30, end => 55, },
chr1 => { start => 30, end => 80, }
);
As first attempt, I thought to try something like this:
identify and then delete all elements with "end" < 40
and:
identify and then delete all elements with "start" > 60
.
So I just tried:
grep { $HoH{$_}{"end"} < 40 } keys(%HoH);
delete $HoH{$_} for grep { $HoH{$_}{"end"} < 40} keys(%HoH);
But just after the first of the two filters I found in the output only last element and I really don't understand where is the mistake:
hash size is 1
chr1: start=30 end=80
printed out with the following:
my $len = keys %HoH;
print "hash size is $len\n";
foreach my $chr ( keys %HoH ) {
print "$chr: ";
for my $position ( keys %{ $HoH{$chr} } ) {
print "$position=$HoH{$chr}{$position} ";
}
print "\n";
}
It seems quite complex for me this time, I would be glad if somebody of you could give me some help.
Upvotes: 3
Views: 4012
Reputation: 6578
Inspect your hash using Data::Dumper
and you'll see that you don't have the data structure you thought you did:
use strict;
use warnings;
use Data::Dumper;
my %HoH = (
chr1 => {
start => 30,
end => 55,
},
chr1 => {
start => 18,
end => 21,
},
chr1 => {
start => 30,
end => 80,
},
);
print Dumper \%HoH;
$VAR1 = {
'chr1' => {
'start' => 30,
'end' => 80
}
};
What's happening is that it is taking that last unique entry for chr1
. Hash keys must be unique
Upvotes: 4
Reputation: 53488
As another poster mentions - your problems isn't your hash merge, it's that hashes cannot have duplicate keys:
use strict;
use warnings;
use Data::Dumper;
my %HoH = (
chr1 => { start => 30, end => 55, },
chr2 => { start => 18, end => 21, },
chr3 => { start => 30, end => 80, }
);
grep { $HoH{$_}{"end"} < 40 } keys(%HoH);
delete $HoH{$_} for grep { $HoH{$_}{"end"} < 40} keys(%HoH);
print Dumper \%HoH;
This works correctly - note the different hash keys. I would note though - you're iterating your keys, grepping them, then deleting them. It might be better to:
foreach my $element ( keys %HoH ) {
delete $HoH{$element}
unless ( $HoH{$element}{start} < 40
or $HoH{$element}{end} > 60 );
}
print Dumper \%HoH;
You could do what you're trying to do via an array of hashes:
use strict;
use warnings;
use Data::Dumper;
my @AoH = (
{ start => 30, end => 55, },
{ start => 18, end => 21, },
{ start => 30, end => 80, }
);
print Dumper \@AoH;
my @filtered = grep { $_->{start} > 40 or $_->{end} < 60 } @AoH;
print Dumper \@filtered;
Note - in your original example, your grep
/delete
lines are doing the same thing, and you can do a compound grep
to test for both conditions.
Upvotes: 4