Want to filter for the max result and print from a table that contains many results for multiple scenarios

Question

I have a CSV table where I have the merged data for 1024 independent variables and 25 dependent variables that are associated with them. For each independent variable (called 1 .. 1024), I have 10 different outcomes. I would like to

choose the best result for each independent variable, and
pipe the line containing that information into a new CSV file.

It seems like a fairly easy thing to ask of perl, and maybe it would be simple to do with a hash of an array of an array, but I'm still confused about how I could implement something like that for this collection of data.

Current code

I found a very helpful Q&A from 2009 on printing matching lines. It works fairly well after some tinkering, but a few issues remain:

I have to pre-sort the file so that my maximum value is the first value that appears for each case.
I also miss out on getting the best result for the first independent variable and
in some instances I get multiple lines returned to me instead of just the maximum value.

I'm fairly sure there must be an easier way to do this, and I would greatly appreciate any help and/or constructive criticism on my (ripped-off) script.

Thank you!

This is what I have so far:

#!/usr/bin/perl 
use warnings;
use strict;
unless ($#ARGV == 0) {
  print "USAGE:  get_best.pl  csvfile 
";
  exit;
}
### this is a script to get the best "score"
my $input = $ARGV[0];
my $outfile = "bestofthebest.csv";
if (-e $outfile ) {    
  system "rm $outfile";
}
open(my $fh,'<',"$input") || die "could not open $input"; #try to open input
open (SUMMARY, ">>","$outfile") || die "could not open $outfile"; #open output file for writing
my $this_line = "";
my $do_next = 0;

while (<$fh>) {
  chomp($_);
  my $last_line = $this_line;
  $this_line = $_;
  if ($this_line =~ m/Seq/) {
    print SUMMARY "$this_line
";next;
  }
  my ($compound,     $rank,     $nnme,     $G1,     ..., $res1,     $res2,     $res3,     $res4,     $res5,     $res6    ) = split(/\s+/, $this_line, 26);
  my ($compound_old, $rank_old, $nnme_old, $G1_old, ..., $res1_old, $res2_old, $res3_old, $res4_old, $res5_old, $res6_old) = split(/\s+/, $last_line, 26);
  foreach ($compound == $compound_old) {
    if (($G1 >= $G1_old)){
      print SUMMARY "$this_line
";
      print "
 $G1 G1 is >> $G1_old G1_old loop
";
      print "
 compound is $compound G1 is $G1
";
      $do_next = 1;
    }
    else {
      $last_line = "";
      $do_next = 0;
    } 
  }
}
close ($fh);
close (SUMMARY);

Example input

This is what the input data looks like (I've left off some columns and rows, obviously)

10  8   3   -18.08  -1.4    -16.68  -15.94  -2.13   -9.45
11  10  4   -15.2   3.2 -18.4   -18.02  2.82    -5
11  5   4   -15.22  2.71    -17.92  -15.88  0.66    -4.51
11  7   4   -14.06  3.84    -17.89  -16.7   2.64    -5.73
11  4   4   -16.63  0.48    -17.1   -15.75  -0.87   -5.92
11  6   4   -15.21  1.83    -17.04  -18.41  3.21    -7
11  9   4   -15.18  1.82    -17 -16.56  1.38    -7.09
11  8   4   -14.98  1.93    -16.91  -16.78  1.79    -10.81
11  2   4   -18.75  -1.95   -16.8   -17.83  -0.92   -7.35
11  1   4   -19.67  -3.17   -16.5   -16.4   -3.27   -9.01
11  3   4   -16.69  -0.54   -16.14  -16.35  -0.34   -9.17
12  7   4   -19.54  -1.14   -18.41  -17.74  -1.81   -2.79
12  9   4   -19.09  -1.01   -18.08  -16.01  -3.09   -5.56
12  4   4   -19.48  -2.18   -17.3   -16.34  -3.14   -4
12  2   4   -19.86  -2.77   -17.1   -15.97  -3.9    -2.96
12  8   4   -19.49  -2.45   -17.03  -16.39  -3.1    -7.19
12  1   4   -20.28  -3.33   -16.95  -17.12  -3.16   -5.18
12  3   4   -18.78  -1.93   -16.86  -17.81  -0.98   -5.39
12  5   4   -19.63  -2.86   -16.77  -16.41  -3.22   -6.54
12  6   4   -19.81  -3.25   -16.56  -16.53  -3.27   -7.19
12  10  4   -19.39  -2.95   -16.44  -17.42  -1.97   -7.67
13  1   3   -13.05  6.35    -19.4   -18.71  5.66    -6.43
13  8   3   -21.44  -2.32   -19.11  -17.08  -4.36   -1.93
13  3   3   -16 2.94    -18.94  -19.24  3.24    -2.78
13  2   3   -13.79  4.9 -18.7   -17.35  3.56    -4.72
13  6   3   -22.08  -3.4    -18.68  -20.12  -1.96   -6.74
13  9   3   -18.98  -0.32   -18.66  -15.97  -3.01   -3.06
13  7   3   -20.4   -2.08   -18.32  -18.24  -2.17   -5.71
13  5   3   -19.94  -1.62   -18.32  -19.42  -0.52   -7.44
13  10  3   -19.26  -1.25   -18.01  -17.52  -1.74   -5.68
13  4   3   -17.75  -1.33   -16.42  -17.75  0   -9.15
14  9   3   -22.23  -3.43   -18.79  -16.68  -5.55   -3.91
14  5   3   -21.32  -2.95   -18.37  -18.08  -3.24   -6.03
14  7   3   -24.25  -6.29   -17.96  -18.78  -5.47   -9.21
14  6   3   -21.03  -3.14   -17.89  -19.17  -1.86   -10.11
14  4   3   -21.59  -3.93   -17.67  -19.32  -2.28   -6.55
14  1   3   -22.43  -4.79   -17.63  -18.09  -4.34   -5.63

Current Output:

10  2   3   -10.11  8.94    -19.04  -18.48  8.38    -4.09
11  5   4   -15.22  2.71    -17.92  -15.88  0.66    -4.51
12  7   4   -19.54  -1.14   -18.41  -17.74  -1.81   -2.79
12  6   4   -19.81  -3.25   -16.56  -16.53  -3.27   -7.19
13  8   3   -21.44  -2.32   -19.11  -17.08  -4.36   -1.93
14  9   3   -22.23  -3.43   -18.79  -16.68  -5.55   -3.91
15  10  4   -21.51  -1.51   -20 -17.63  -3.88   -2.45
16  5   4   -17.81  2.56    -20.37  -19.09  1.28    -1.19
16  2   4   -16.61  1.97    -18.58  -21.06  4.45    -6.47

Want to filter for the max result and print from a table that contains many results for multiple scenarios

Current code

Example input

Current Output:

Answers (1)

Related Questions