Reputation: 17
I am trying to get the highest value for columns A, B, and C and tie those values to the date and times they occured. To get the max value i just need to use the List::Util module, but i am not sure how to point it to a particular column in the array then reference the date and time.
DATE TIME A B C
11/22/14 21:00:00 5,854 2,105 1,290
11/22/14 21:02:35 7,692 2,593 2,649
11/22/14 21:05:10 1,639 458 444
11/22/14 21:07:00 1,032 487 434
11/22/14 21:08:15 4,707 1,352 646
11/22/14 21:10:22 351 46 162
11/22/14 21:10:55 5,507 1,943 957
11/22/14 21:11:00 1,703 647 516
11/22/14 21:12:00 2,359 751 785
11/22/14 21:14:05 67 25 44
11/22/14 21:16:25 4,072 1,596 1,050
11/22/14 21:17:48 5,060 2,131 1,996
11/22/14 21:19:00 341 42 137
11/22/14 21:23:00 1,308 71 634
Upvotes: 1
Views: 1138
Reputation: 6378
Here is a mechanical, procedural, and slightly more advanced than "baby perl" approach. The script is fairly simple and sticks to known perl "idioms".
First we slurp all of DATA
into @lines
- an array of arrays - using map
- we could use a while (<DATA>){...}
loop here instead. We remove the commas while we are here (tr/,//d
)
We then use per-column temporary arrays (@atmp
, @btmp
, ...) and populate them with a direct sort
of @lines
, accessing the relevant column in the internal anonymous array ($a->[n] ...
) for the sort
operation: this way we don't use a module and avoid using map
.
Once we have (reverse) sorted the arrays by column, we can print the first element to get the highest value for each column. To print, we dereference the first element (e.g. @{$atmp[0]}
since it is an anonymous array) to get back the whole line - that way we keep the "highest value" together with the other columns for our output.
NB to highlight the columnar sort I changed the original data so that different lines appear as the maximum for each column. In the original data the second line has the highest value for all three columns. I'm using tr|,||d
instead of tr/,//d
only to unbreak the SO syntax highlighter.
use v5.16; # adds strict and warnings
my @lines = map { tr|,||d; [ split ] } <DATA> ;
shift @lines; # removes header
my @atmp = sort{ $b->[2]<=>$a->[2] } @lines;
my @btmp = sort{ $b->[3]<=>$a->[3] } @lines;
my @ctmp = sort{ $b->[4]<=>$a->[4] } @lines ;
print "\t DATE TIME A B C \n";
print "Max A: @{$atmp[0]}\nMax B: @{$btmp[0]}\nMax C: @{$ctmp[0]}\n" ;
__DATA__
DATE TIME A B C
11/22/14 21:00:00 5,854 2,105 1,290
11/22/14 21:02:35 7,692 2,593 2,649
11/22/14 21:05:10 1,639 458 444
11/22/14 21:07:00 1,032 487 434
11/22/14 21:08:15 4,707 1,352 6460
11/22/14 21:10:22 351 46 162
11/22/14 21:10:55 5,507 9,943 957
11/22/14 21:11:00 1,703 647 516
11/22/14 21:12:00 2,359 751 785
11/22/14 21:14:05 67 25 44
11/22/14 21:16:25 4,072 1,596 1,050
11/22/14 21:17:48 5,060 2,131 1,996
11/22/14 21:19:00 341 42 137
11/22/14 21:23:00 1,308 71 634
output:
~/$ perl sort_by_column.pl
DATE TIME A B C
Max A: 11/22/14 21:02:35 7692 2593 2649
Max B: 11/22/14 21:10:55 5507 9943 957
Max C: 11/22/14 21:08:15 4707 1352 6460
use DDP;
followed by p @lines
, p @atmp
etc. can be a helpful tool for "visualization". See Data::Printer
for more details.Upvotes: 1
Reputation: 107060
My understanding:
You want three pieces of data at the end:
You don't want to sort the data. You don't need to store anything else.
You didn't specify any code, so I'm not going to give you a complete program. You need to try that yourself. Instead, I'll give you some hints:
You should read up on References, and how they work. We'll store your data in a reference:
my %data;
$data{A}->{time} = "xxxxx"; # Time of highest item in column "A"
$data{A}->{date} = "xxxxx"; # Date of highest item in column "A"
$data{A}->{value} = "xxxx"; # Value of highest item in column "A"
Same with the other two columns.
You can loop through your data (is it a file? You didn't explain)
my %data;
while ( my $line .... ) {
my ( $date, $time, a_value, b_value, c_value ) = split /\s*/, $line;
if ( not exists $data{A}->{value} or $a_value > $data{A}->{value} ) {
$data{A}->{value} = $a_value;
$data{A}->{date} = $date;
$data{A}->{time} = $time;
}
... # Same for B and C
In this loop, I set $data{A}->{value
} to the value I just read in if that value is higher than the one I had previously stored. It's a common way of looking for the highest value. I also need to store it if that value doesn't already exist. Thus, I check to see if $data{A}->{value}
exists or not. If it doesn't, I need to store that value anyway. (I could have done just if ( not exists $data{A} or $a_value > $data{A}->{value} )
).
After your while loop, %data
will contain the highest values of each column, and the date and time for those values. There's a lot of code repetition. I could have added an inner loop for each column, but it's not worth the effort with just three columns.
Also, remember not to include your header column in your data.
If $data{A}->{value}
confuses you, you can use three separate hashes: One to store the date , one to store the time, and one to store the value.
my %times;
my %dates
my %values;
while ( my $line .... ) {
my ( $date, $time, a_value, b_value, c_value ) = split /\s*/, $line;
if ( not exists $values{A} or $a_value > $values{A} ) {
$values{A} = $a_value;
$dates{A} = $date;
$times{A} = $time;
}
... # Same for B and C
Upvotes: 1
Reputation: 50657
This might resemble Schwartzian transform in a way to reduce some overhead due lines splitting, and it uses reduce()
from List::Util
core module to pick just a line with max value,
use strict;
use warnings;
use List::Util 'reduce';
(undef, my @tmp) = map { tr/,/./; [ $_, split ] } <DATA>;
my ($max_a) =
map $_->[0],
reduce {
$a->[3] > $b->[3] ? $a : $b
}
@tmp;
my ($max_b) =
map $_->[0],
reduce {
$a->[4] > $b->[4] ? $a : $b
}
@tmp;
my ($max_c) =
map $_->[0],
reduce {
$a->[5] > $b->[5] ? $a : $b
}
@tmp;
print
"maxA: ", $max_a,
"maxB: ", $max_b,
"maxC: ", $max_c;
__DATA__
DATE TIME A B C
11/22/14 21:00:00 5,854 2,105 1,290
11/22/14 21:02:35 7,692 2,593 2,649
11/22/14 21:05:10 1,639 458 444
11/22/14 21:07:00 1,032 487 434
11/22/14 21:08:15 4,707 1,352 646
11/22/14 21:10:22 351 46 162
11/22/14 21:10:55 5,507 1,943 957
11/22/14 21:11:00 1,703 647 516
11/22/14 21:12:00 2,359 751 785
11/22/14 21:14:05 67 25 44
11/22/14 21:16:25 4,072 1,596 1,050
11/22/14 21:17:48 5,060 2,131 1,996
11/22/14 21:19:00 341 42 137
11/22/14 21:23:00 1,308 71 634
output
maxA: 11/22/14 21:10:22 351 46 162
maxB: 11/22/14 21:12:00 2.359 751 785
maxC: 11/22/14 21:10:55 5.507 1.943 957
Some refactoring,
sub get_max {
my ($pos, $r) = @_;
return map $_->[0],
reduce {
$a->[$pos] > $b->[$pos] ? $a : $b
}
@$r;
}
my ($max_a) = get_max(3, \@tmp);
my ($max_b) = get_max(4, \@tmp);
my ($max_c) = get_max(5, \@tmp);
Upvotes: 1