Reputation: 6578
I have some input data in the following format (tab delineated):
(gene condition value)
wnt condition1 1
wnt condition2 10
wnt condition3 15
wnt condition4 -1
bmp condition1 10
bmp condition2 inf
bmp condition3 12
bmp condition4 -1
frz condition1 -12
frz condition2 -6
frz condition3 -0.3
And am building a HoH as follows:
#!/usr/bin/perl
use warnings;
use strict;
use File::Slurp;
use Data::Dumper;
my @data = read_file('stack.txt');
my %hash;
foreach (@data){
chomp;
my ($gene, $condition, $value) = (/^(\w+)\t(\w+\d)\t(-?\d+|-?inf)/);
$hash{$gene}{$condition} = $value;
}
I want to loop through the HoH and, for each gene, print out the values provided all values for that gene are either positive (e.g. 10) or negative (-3). In the data above I would only print out:
frz condition1 -12
frz condition2 -6
frz condition3 -0.3
As both other genes contain conditions with values that are both positive and negative:
wnt condition1 1
wnt condition2 10
wnt condition3 15
wnt condition4 -1 # discrepancy
bmp condition1 10
bmp condition2 inf
bmp condition3 12
bmp condition4 -1 # discrepancy
I can loop through as follows, but am not sure how to make comparisons between one HoH value and the 'next' value for that gene condition key combo:
for my $gene (sort keys %hash) {
for my $condition (sort keys %{$hash{$gene}}) {
my $value = $hash{$gene}{$condition};
print "$gene\t$condition\t$value\n" if $value =~ m/-/; # This obviously will only print out negative values. I want to compare all values here, and if they are all positive, or all negative, print them.
}
}
Let me know If I can clarify this further
Upvotes: 0
Views: 182
Reputation: 3780
Instead of comparing one value with its neighbor in isolation, you can iterate over the entire list of values for a given gene and increment separate counters for positive and negative values, and then compare the counts to see whether a discrepancy exists.
Assuming your data matches the following scheme:
'bmp' => HASH(0x7324710)
'condition1' => 10
'condition2' => 'inf'
'condition3' => 12
'condition4' => '-1'
'frz' => HASH(0x7323c78)
'condition1' => '-12'
'condition2' => '-6'
'condition3' => '-0.3'
'wnt' => HASH(0x72a5c30)
'condition1' => 1
'condition2' => 10
'condition3' => 15
'condition4' => '-1'
This replacement, for the last code block in your question, will give you the result you need:
for my $gene (sort keys %hash) {
# These variables will contain:
# - Counts of positive and negative values
my ($pos_vals, $neg_vals) = (0, 0);
# - A true/false value indicating whether discrepancy exists
my $discrepant = undef;
# - A list of the values of all conditions for a given gene
my @values = ();
# Collect condition values for this gene into @values
my @values = values %{ $hash{$gene} };
# For each such value, test for a leading - and increment
# the positive or negative value count accordingly
for @values { $_ =~ m/^-/ ? $neg_vals++ : $pos_vals++ };
# If neither counter is zero (i.e. both evaluate true), then
# a discrepancy exists; otherwise, one doesn't -- either way,
# we put the test result in $discrepant so as to produce a
# cleaner test in the following if statement
$discrepant = (($pos_vals > 0) and ($neg_vals > 0));
# In the absence of a discrepancy...
if (not $discrepant) {
# iterate over the conditions for this gene and print the gene
# name, the condition name, and the value
# NB: this is somewhat idiomatic Perl, but you'll tend to see
# it from time to time and it's thus worth knowing about
print "$gene\t$_\t$hash{$gene}->{$_}\n"
foreach sort keys %{ $hash{$gene} };
};
}
NB: This will handle both positive and negative infinity correctly, but will treat zero as positive, which may not be correct for your case. Do zero values occur in your data? If so, should they be treated as positive, negative, or neither?
Upvotes: 1
Reputation: 126722
This code solves the problem by checking all the values in the hash for each gene and incrementing $neg
if the value contains a minus sign, otherwise $pos
. If either the positive count or the negative count is zero then all values are of the same sign and the data for that gene is sorted and displayed.
Note that this counts inf
and 0
as positive, which may or may not be what is wanted.
Note that using read_file
is wasteful as it pulls the entire file into memory at once. Instead of looping over an array you may as well use a while
loop and read from the file line by line. With use autodie
there is no need to check the success of the file open
call.
use strict;
use warnings;
use autodie;
open my $fh, '<', 'stack.txt';
my %data;
while (<$fh>) {
chomp;
my ($gene, $condition, $value) = split /\t/;
$data{$gene}{$condition} = $value;
}
while (my ($gene, $values) = each %data) {
my ($pos, $neg) = (0, 0);
++(/-/ ? $neg : $pos) for values %$values;
unless ($neg and $pos) {
for my $condition (sort keys %$values) {
printf "%s\t%s\t%s\n", $gene, $condition, $values->{$condition};
}
}
}
output
frz condition1 -12
frz condition2 -6
frz condition3 -0.3
Upvotes: 1
Reputation: 264
my @data = <$your_file_handle>;
my %hash;
foreach (@data){
chomp;
my ($gene, $condition, $value) = split; #Sorry, your regex didn't work for me,
#hence the change.
$hash{$gene}{$condition} = $value;
}
for my $gene (sort keys %hash){
my $values = join '', values $hash{$gene};
my $num = %{$hash{$gene}}/1; #Number of conditions
#when no '-' is detected or number of '-' matches the one of conditions, print.
say $gene if ($values !~ /-/ or $values =~ tr/-/-/ == $num);
}
Upvotes: -1