theo4786
theo4786

Reputation: 159

find max value in each row, return row ID and column header of max value

I have a file (19,000 lines and 200 columns) that looks like:

GENE CLM-MXL-ACB_PBS CLM-PEL-ACB_PBS CLM-PUR-ACB_PBS 
A1BG 2.927682 2.935334 -0.262044
A1CF -0.503390 -0.193219 0.038984
A2M -0.217628 -0.264332 -0.380048
A2ML1 -0.040747 0.566124 0.935753

I would like to find the maximum value for each row, and print the max value, row ID, and column header of the column the max was found in. Output would look like:

 A1BG CLM-PEL-ACB_PBS 2.935334
 A1CF CLM-PUR-ACB_PBS 0.038984
 A2M CLM-MXL-ACB_PBS -0.217628
 A2ML1 CLM-PUR-ACB_PBS 0.935753

I have a perl script that does not work:

my$F=shift@ARGV;
open IN, "$F";

my@line;
while (<IN>){
    next if m/GENE/;
    foreach my $l (@line) {
        my ($row_id, @row_data) = split( / /, $l ) ;
        my $idx = 0 ;
        do { $idx = $_ if $row[$_] > $row[$idx] } for 1..$#row_data ;
        print "$row_id $idx\n" ;
        next;
    }
}

Can someone point out what's wrong with the code or suggest another solution?

Upvotes: 0

Views: 384

Answers (1)

Tanktalus
Tanktalus

Reputation: 22294

What I'd do is, first, save the header, so I can spit out the column name later. Then, go through each line, saving each field by its header name, and then look for the max number (normally I'd use List::Util::max, but I need the key with the max number, not the max number itself, so I fake it with List::Util::reduce). And then I can spit out the discovered maximum. To save a bit of effort here, I'm putting your data inline, but you can read it from arbitrary files as well. (I would use open my $in, '<', $F or die "Can't read from $F: $!" instead of your open statement, though, for better error detection and handling.)

#!/opt/myperl/5.20.2/bin/perl

use 5.10.0;
use List::Util qw(reduce);

my @header = do {
    my $line = <DATA>;
    split ' ', $line;
};

while (<DATA>)
{
    my @fields = split ' ', $_;
    my %data; @data{@header} = @fields;

    my $max_key = reduce { $data{$a} > $data{$b} ? $a : $b } @header[1..$#header];

    say join ' ', $data{$header[0]}, $max_key, $data{$max_key};
}


__END__
GENE CLM-MXL-ACB_PBS CLM-PEL-ACB_PBS CLM-PUR-ACB_PBS 
A1BG 2.927682 2.935334 -0.262044
A1CF -0.503390 -0.193219 0.038984
A2M -0.217628 -0.264332 -0.380048
A2ML1 -0.040747 0.566124 0.935753

Upvotes: 2

Related Questions