Newbie_Mark
Newbie_Mark

Reputation: 17

Getting the maximum value per column in a multidimensional array

I am trying to get the highest value for columns A, B, and C and tie those values to the date and times they occured. To get the max value i just need to use the List::Util module, but i am not sure how to point it to a particular column in the array then reference the date and time.

DATE    TIME    A   B   C
11/22/14    21:00:00    5,854   2,105   1,290
11/22/14    21:02:35    7,692   2,593   2,649
11/22/14    21:05:10    1,639   458 444
11/22/14    21:07:00    1,032   487 434
11/22/14    21:08:15    4,707   1,352   646
11/22/14    21:10:22    351 46  162
11/22/14    21:10:55    5,507   1,943   957
11/22/14    21:11:00    1,703   647 516
11/22/14    21:12:00    2,359   751 785
11/22/14    21:14:05    67  25  44
11/22/14    21:16:25    4,072   1,596   1,050
11/22/14    21:17:48    5,060   2,131   1,996
11/22/14    21:19:00    341 42  137
11/22/14    21:23:00    1,308   71  634

Upvotes: 1

Views: 1138

Answers (3)

G. Cito
G. Cito

Reputation: 6378

Here is a mechanical, procedural, and slightly more advanced than "baby perl" approach. The script is fairly simple and sticks to known perl "idioms".

  • First we slurp all of DATA into @lines - an array of arrays - using map - we could use a while (<DATA>){...} loop here instead. We remove the commas while we are here (tr/,//d)

  • We then use per-column temporary arrays (@atmp, @btmp, ...) and populate them with a direct sort of @lines, accessing the relevant column in the internal anonymous array ($a->[n] ...) for the sort operation: this way we don't use a module and avoid using map.

  • Once we have (reverse) sorted the arrays by column, we can print the first element to get the highest value for each column. To print, we dereference the first element (e.g. @{$atmp[0]} since it is an anonymous array) to get back the whole line - that way we keep the "highest value" together with the other columns for our output.

NB to highlight the columnar sort I changed the original data so that different lines appear as the maximum for each column. In the original data the second line has the highest value for all three columns. I'm using tr|,||d instead of tr/,//d only to unbreak the SO syntax highlighter.

use v5.16;   # adds strict and warnings

my @lines = map { tr|,||d; [ split ] } <DATA> ; 
shift @lines;   # removes header

my @atmp = sort{ $b->[2]<=>$a->[2] } @lines;     
my @btmp = sort{ $b->[3]<=>$a->[3] } @lines;     
my @ctmp = sort{ $b->[4]<=>$a->[4] } @lines ;   

print "\t DATE     TIME    A    B    C \n";
print "Max A: @{$atmp[0]}\nMax B: @{$btmp[0]}\nMax C: @{$ctmp[0]}\n" ;  

__DATA__    
DATE    TIME    A   B   C     
11/22/14    21:00:00    5,854   2,105   1,290    
11/22/14    21:02:35    7,692   2,593   2,649    
11/22/14    21:05:10    1,639   458 444  
11/22/14    21:07:00    1,032   487 434
11/22/14    21:08:15    4,707   1,352   6460 
11/22/14    21:10:22    351 46  162 
11/22/14    21:10:55    5,507   9,943   957  
11/22/14    21:11:00    1,703   647 516  
11/22/14    21:12:00    2,359   751 785      
11/22/14    21:14:05    67  25  44          
11/22/14    21:16:25    4,072   1,596   1,050 
11/22/14    21:17:48    5,060   2,131   1,996    
11/22/14    21:19:00    341 42  137 
11/22/14    21:23:00    1,308   71  634  

output:

~/$ perl sort_by_column.pl

         DATE     TIME    A    B    C 
Max A: 11/22/14 21:02:35 7692 2593 2649
Max B: 11/22/14 21:10:55 5507 9943 957
Max C: 11/22/14 21:08:15 4707 1352 6460

  • To better understand the data structures created by the perl use DDP; followed by p @lines , p @atmp etc. can be a helpful tool for "visualization". See Data::Printer for more details.

Upvotes: 1

David W.
David W.

Reputation: 107060

My understanding:

You want three pieces of data at the end:

  • The date and time for the highest number in column A
  • The date and time for the highest number in column B
  • The date and time for the highest number in column C

You don't want to sort the data. You don't need to store anything else.

You didn't specify any code, so I'm not going to give you a complete program. You need to try that yourself. Instead, I'll give you some hints:

You should read up on References, and how they work. We'll store your data in a reference:

my %data;
$data{A}->{time} = "xxxxx";   # Time of highest item in column "A"
$data{A}->{date} = "xxxxx";   # Date of highest item in column "A"
$data{A}->{value} = "xxxx";   # Value of highest item in column "A"

Same with the other two columns.

You can loop through your data (is it a file? You didn't explain)

my %data;
while ( my $line .... ) {
    my ( $date, $time, a_value, b_value, c_value ) = split /\s*/, $line;
    if ( not exists $data{A}->{value} or $a_value > $data{A}->{value} ) {
        $data{A}->{value} = $a_value;
        $data{A}->{date}  = $date;
        $data{A}->{time}  = $time;
    }
    ...    # Same for B and C

In this loop, I set $data{A}->{value} to the value I just read in if that value is higher than the one I had previously stored. It's a common way of looking for the highest value. I also need to store it if that value doesn't already exist. Thus, I check to see if $data{A}->{value} exists or not. If it doesn't, I need to store that value anyway. (I could have done just if ( not exists $data{A} or $a_value > $data{A}->{value} )).

After your while loop, %data will contain the highest values of each column, and the date and time for those values. There's a lot of code repetition. I could have added an inner loop for each column, but it's not worth the effort with just three columns.

Also, remember not to include your header column in your data.

If $data{A}->{value} confuses you, you can use three separate hashes: One to store the date , one to store the time, and one to store the value.

my %times;
my %dates
my %values;
while ( my $line .... ) {
    my ( $date, $time, a_value, b_value, c_value ) = split /\s*/, $line;
    if ( not exists $values{A} or $a_value > $values{A} ) {
        $values{A} = $a_value;
        $dates{A}  = $date;
        $times{A}  = $time;
    }
    ...    # Same for B and C

Upvotes: 1

mpapec
mpapec

Reputation: 50657

This might resemble Schwartzian transform in a way to reduce some overhead due lines splitting, and it uses reduce() from List::Util core module to pick just a line with max value,

use strict;
use warnings;
use List::Util 'reduce';

(undef, my @tmp) = map { tr/,/./; [ $_, split ] } <DATA>;

my ($max_a) = 
  map $_->[0],
  reduce {
    $a->[3] > $b->[3] ? $a : $b
  }
  @tmp;

my ($max_b) = 
  map $_->[0],
  reduce {
    $a->[4] > $b->[4] ? $a : $b
  }
  @tmp;

my ($max_c) = 
  map $_->[0],
  reduce {
    $a->[5] > $b->[5] ? $a : $b
  }
  @tmp;

print
  "maxA: ", $max_a,
  "maxB: ", $max_b,
  "maxC: ", $max_c;

__DATA__
DATE    TIME    A   B   C
11/22/14    21:00:00    5,854   2,105   1,290
11/22/14    21:02:35    7,692   2,593   2,649
11/22/14    21:05:10    1,639   458 444
11/22/14    21:07:00    1,032   487 434
11/22/14    21:08:15    4,707   1,352   646
11/22/14    21:10:22    351 46  162
11/22/14    21:10:55    5,507   1,943   957
11/22/14    21:11:00    1,703   647 516
11/22/14    21:12:00    2,359   751 785
11/22/14    21:14:05    67  25  44
11/22/14    21:16:25    4,072   1,596   1,050
11/22/14    21:17:48    5,060   2,131   1,996
11/22/14    21:19:00    341 42  137
11/22/14    21:23:00    1,308   71  634

output

maxA: 11/22/14    21:10:22    351 46  162
maxB: 11/22/14    21:12:00    2.359   751 785
maxC: 11/22/14    21:10:55    5.507   1.943   957

Some refactoring,

sub get_max {
  my ($pos, $r) = @_;

  return map $_->[0],
    reduce {
      $a->[$pos] > $b->[$pos] ? $a : $b
    }
    @$r;
}

my ($max_a) = get_max(3, \@tmp);
my ($max_b) = get_max(4, \@tmp);
my ($max_c) = get_max(5, \@tmp);

Upvotes: 1

Related Questions