fugu
fugu

Reputation: 6578

How to compare hash values by key

I have some input data in the following format (tab delineated):

(gene condition value)

wnt condition1  1
wnt condition2  10
wnt condition3  15
wnt condition4  -1
bmp condition1  10
bmp condition2  inf
bmp condition3  12
bmp condition4  -1
frz condition1  -12
frz condition2  -6
frz condition3  -0.3

And am building a HoH as follows:

#!/usr/bin/perl
use warnings;
use strict; 
use File::Slurp;
use Data::Dumper;

my @data = read_file('stack.txt');

my %hash;
foreach (@data){
    chomp;
    my ($gene, $condition, $value) = (/^(\w+)\t(\w+\d)\t(-?\d+|-?inf)/);
    $hash{$gene}{$condition} = $value;
}

I want to loop through the HoH and, for each gene, print out the values provided all values for that gene are either positive (e.g. 10) or negative (-3). In the data above I would only print out:

frz condition1  -12
frz condition2  -6
frz condition3  -0.3

As both other genes contain conditions with values that are both positive and negative:

wnt condition1  1
wnt condition2  10
wnt condition3  15
wnt condition4  -1 # discrepancy

bmp condition1  10
bmp condition2  inf
bmp condition3  12
bmp condition4  -1 # discrepancy 

I can loop through as follows, but am not sure how to make comparisons between one HoH value and the 'next' value for that gene condition key combo:

for my $gene (sort keys %hash) { 
     for my $condition (sort keys %{$hash{$gene}}) {
        my $value = $hash{$gene}{$condition};
        print "$gene\t$condition\t$value\n" if $value  =~ m/-/;  # This obviously will only print out negative values. I want to compare all values here, and if they are all positive, or all negative, print them.        
    }
}

Let me know If I can clarify this further

Upvotes: 0

Views: 182

Answers (3)

Aaron Miller
Aaron Miller

Reputation: 3780

Instead of comparing one value with its neighbor in isolation, you can iterate over the entire list of values for a given gene and increment separate counters for positive and negative values, and then compare the counts to see whether a discrepancy exists.

Assuming your data matches the following scheme:

'bmp' => HASH(0x7324710)
   'condition1' => 10
   'condition2' => 'inf'
   'condition3' => 12
   'condition4' => '-1'
'frz' => HASH(0x7323c78)
   'condition1' => '-12'
   'condition2' => '-6'
   'condition3' => '-0.3'
'wnt' => HASH(0x72a5c30)
   'condition1' => 1
   'condition2' => 10
   'condition3' => 15
   'condition4' => '-1'

This replacement, for the last code block in your question, will give you the result you need:

for my $gene (sort keys %hash) {
    # These variables will contain:
    # - Counts of positive and negative values
    my ($pos_vals, $neg_vals) = (0, 0);
    # - A true/false value indicating whether discrepancy exists
    my $discrepant = undef;
    # - A list of the values of all conditions for a given gene
    my @values = ();

    # Collect condition values for this gene into @values
    my @values = values %{ $hash{$gene} };

    # For each such value, test for a leading - and increment
    # the positive or negative value count accordingly
    for @values { $_ =~ m/^-/ ? $neg_vals++ : $pos_vals++ };

    # If neither counter is zero (i.e. both evaluate true), then
    # a discrepancy exists; otherwise, one doesn't -- either way,
    # we put the test result in $discrepant so as to produce a
    # cleaner test in the following if statement
    $discrepant = (($pos_vals > 0) and ($neg_vals > 0));

    # In the absence of a discrepancy...
    if (not $discrepant) {
        # iterate over the conditions for this gene and print the gene
        # name, the condition name, and the value
        # NB: this is somewhat idiomatic Perl, but you'll tend to see
        # it from time to time and it's thus worth knowing about
        print "$gene\t$_\t$hash{$gene}->{$_}\n"
          foreach sort keys %{ $hash{$gene} };
    };
}

NB: This will handle both positive and negative infinity correctly, but will treat zero as positive, which may not be correct for your case. Do zero values occur in your data? If so, should they be treated as positive, negative, or neither?

Upvotes: 1

Borodin
Borodin

Reputation: 126722

This code solves the problem by checking all the values in the hash for each gene and incrementing $neg if the value contains a minus sign, otherwise $pos. If either the positive count or the negative count is zero then all values are of the same sign and the data for that gene is sorted and displayed.

Note that this counts inf and 0 as positive, which may or may not be what is wanted.

Note that using read_file is wasteful as it pulls the entire file into memory at once. Instead of looping over an array you may as well use a while loop and read from the file line by line. With use autodie there is no need to check the success of the file open call.

use strict;
use warnings;
use autodie;

open my $fh, '<', 'stack.txt';

my %data;

while (<$fh>) {
  chomp;
  my ($gene, $condition, $value) = split /\t/;
  $data{$gene}{$condition} = $value;
}

while (my ($gene, $values) = each %data) {

  my ($pos, $neg) = (0, 0);

  ++(/-/ ? $neg : $pos) for values %$values;

  unless ($neg and $pos) {
    for my $condition (sort keys %$values) {
      printf "%s\t%s\t%s\n", $gene, $condition, $values->{$condition};
    }
  }
}

output

frz condition1  -12
frz condition2  -6
frz condition3  -0.3

Upvotes: 1

jing yu
jing yu

Reputation: 264

my @data = <$your_file_handle>;

my %hash;
foreach (@data){
    chomp;
    my ($gene, $condition, $value) = split; #Sorry, your regex didn't work for me, 
                                            #hence the change.
    $hash{$gene}{$condition} = $value;
}

for my $gene (sort keys %hash){
    my $values = join '', values $hash{$gene};
    my $num = %{$hash{$gene}}/1;  #Number of conditions

    #when no '-' is detected or number of '-' matches the one of conditions, print.
    say $gene if ($values !~ /-/ or $values =~ tr/-/-/ == $num); 
}

Upvotes: -1

Related Questions