Francesco Gandolfi
Francesco Gandolfi

Reputation: 73

Perl; how to filter an hash by value (specifying a condition)

I'm not very expert in perl language but I encountered a problem that I couldn't fix, even after a long research on the web. Briefly, I have an hash of hashes like this:

my %HoH = (
    chr1 => { start => 30, end => 55, },
    chr1 => { start => 18, end => 21, },
    chr1 => { start => 30, end => 80, }
);

I simply would like to find a way to filter it ( I mean, obtaining a new hash of hashes in output) for particular values. In particular, given an interval, let's say 40-60, I want a new hash of hashes with only elements overlapping this interval.

in other words I would like to get as output:

my %HoH = (
    chr1 => { start => 30, end => 55, },
    chr1 => { start => 30, end => 80, }
);

As first attempt, I thought to try something like this:

identify and then delete all elements with "end" < 40 and: identify and then delete all elements with "start" > 60.

So I just tried:

grep { $HoH{$_}{"end"} < 40 } keys(%HoH); 
delete $HoH{$_} for grep { $HoH{$_}{"end"} < 40} keys(%HoH);

But just after the first of the two filters I found in the output only last element and I really don't understand where is the mistake:

hash size is 1
chr1: start=30 end=80 

printed out with the following:

my $len = keys %HoH;
print "hash size is $len\n";

foreach my $chr ( keys %HoH ) {
   print "$chr: ";
   for my $position ( keys %{ $HoH{$chr} } ) {
      print "$position=$HoH{$chr}{$position} ";
   }
   print "\n";
}

It seems quite complex for me this time, I would be glad if somebody of you could give me some help.

Upvotes: 3

Views: 4012

Answers (2)

fugu
fugu

Reputation: 6578

Inspect your hash using Data::Dumper and you'll see that you don't have the data structure you thought you did:

use strict;
use warnings;
use Data::Dumper;

my %HoH = (
          chr1 => {
                   start => 30,
                   end => 55,
          },
          chr1 => {
                   start => 18,
                   end => 21,
                   },
          chr1 => {
                   start => 30,
                   end => 80,
                   },
            );
            
print Dumper \%HoH;     

$VAR1 = {
          'chr1' => {
                      'start' => 30,
                      'end' => 80
                    }
        };

What's happening is that it is taking that last unique entry for chr1. Hash keys must be unique

Upvotes: 4

Sobrique
Sobrique

Reputation: 53488

As another poster mentions - your problems isn't your hash merge, it's that hashes cannot have duplicate keys:

use strict;
use warnings;
use Data::Dumper;

my %HoH = (
    chr1 => { start => 30, end => 55, },
    chr2 => { start => 18, end => 21, },
    chr3 => { start => 30, end => 80, }
);


grep { $HoH{$_}{"end"} < 40 } keys(%HoH); 
delete $HoH{$_} for grep { $HoH{$_}{"end"} < 40} keys(%HoH);

print Dumper \%HoH;

This works correctly - note the different hash keys. I would note though - you're iterating your keys, grepping them, then deleting them. It might be better to:

foreach my $element ( keys %HoH ) {
    delete $HoH{$element}
        unless ( $HoH{$element}{start} < 40
              or $HoH{$element}{end}   > 60 );
}

print Dumper \%HoH;

You could do what you're trying to do via an array of hashes:

use strict;
use warnings;
use Data::Dumper;

my @AoH = (
    { start => 30, end => 55, },
    { start => 18, end => 21, },
    { start => 30, end => 80, }
);

print Dumper \@AoH;

my @filtered = grep { $_->{start} > 40 or $_->{end} < 60 } @AoH;
print Dumper \@filtered;

Note - in your original example, your grep/delete lines are doing the same thing, and you can do a compound grep to test for both conditions.

Upvotes: 4

Related Questions