Reputation: 141
What is the best way to setup a range counter in a for loop? I have a tab-delim input file where the first 2 columns are important. I would like to find the min and max values of scores where they occur within a range of Pos values. So for the sample input file:
Pos Score
1 5
2 17
9 80
38 22
40 11
7 0
302 19
85 33
12 51
293 1
5 19
61 8
71 15
I need to calculate the min and max scores for each range, if they exist.
1-29 (min=?, max=?)
30-59 (min=?, max=?)
60-89 (min=?, max=?)
Expected results:
1-29 (min=0, max=80)
30-59 (min=11, max=22)
60-89 (min=8, max=33)
290-219 (min=1, max=19)
There was another thread related to this but they are only counting occurrences with a set range. My attempt was to setup a for loop:
use List::MoreUtils qw( minmax );
my %inputhash;
my %storehash;
open (FF,$inputfile) || die "Cannot open file $inputfile";
while(<FF>) {
next if $. < 2; #use to trim off first line if there is a header
my ($Pos, $Score) = split;
$inputhash{$Pos} = $Score;
}
for (my $x=1; $x<1600; $x+29) #set to 1600 for now
{
my $low = $x;
my $high = $x+29;
foreach my $i ($low...$high)
{
if (exists $inputhash{$i})
{
my $score = $inputhash{$i};
push (@{$storehash{$high}}, $score);
}
}
}
foreach my $range (sort {$a <=> $b} keys %storehash)
{
my ($minrange, $maxrange) = minmax @{$storehash{$range}};
print "$range: $minrange, $maxrange\n";
}
Is there a better way to handle this? This current implementation gives me an error: Useless use of addition (+) in void context.
Upvotes: 3
Views: 1345
Reputation: 50657
Using command line,
perl -ane'
/\d/ or next;
$i = int($F[0] /30);
(!defined or $_ >$F[1]) and $_ = $F[1] for $r[$i]{m};
(!defined or $_ <$F[1]) and $_ = $F[1] for $r[$i]{M};
}{
printf("%d-%d (min=%d, max=%d)\n", $_*30, $_*30+29, $r[$_]{m}, $r[$_]{M})
for grep $r[$_], 0 .. $#r;
' file
output
0-29 (min=0, max=80)
30-59 (min=11, max=22)
60-89 (min=8, max=33)
270-299 (min=1, max=1)
300-329 (min=19, max=19)
Script equivalent of command line version,
my @r;
while (<>) {
/\d/ or next;
my @F = split;
my $i = int($F[0] /30);
# min topicalizer, refer to $r[$i]{m} as $_
for ($r[$i]{m}) {
$_ = $F[1] if !defined or $_ >$F[1];
}
# max topicalizer
for ($r[$i]{M}) {
$_ = $F[1] if !defined or $_ <$F[1];
}
}
for (grep $r[$_], 0 .. $#r) {
printf("%d-%d (min=%d, max=%d)\n", $_*30, $_*30+29, $r[$_]{m}, $r[$_]{M});
}
Upvotes: 1
Reputation: 126742
The error message
Useless use of addition (+) in void context
should have alerted you to the last clause of your for
loop being $x+29
instead of $x += 29
. Apart from that you have simple boundary errors on the ranges
If your range widths are all the same size, then the easiest way is to calculate the range for each position by simple division and build a list of scores for each range. The minimum and maximum in each range can be determined afterwards
This solution uses a constant WIDTH
to determine the size of each range; in this case it is 30
use strict;
use warnings;
use autodie;
use List::MoreUtils 'minmax';
use constant WIDTH => 30;
<>; # lose the header
my @buckets;
while (<>) {
my ($pos, $score) = split;
push @{ $buckets[$pos / WIDTH] }, $score;
}
for my $i (0 .. $#buckets) {
next unless my $contents = $buckets[$i];
my $start = $i * WIDTH;
printf "%d-%d (min=%d, max=%d)\n",
$start, $start + WIDTH - 1,
minmax @$contents;
}
output
0-29 (min=0, max=80)
30-59 (min=11, max=22)
60-89 (min=8, max=33)
270-299 (min=1, max=1)
300-329 (min=19, max=19)
Upvotes: 1
Reputation: 98078
use strict;
use warnings;
use List::Util qw(max min);
my $step = 30; # group into 30 item ...
my @bins; # ... bins
<DATA>; # skip line
while (<DATA>) {
my ($p, $s) = split;
push @{$bins[$p / $step]}, $s;
}
for (my $i = 0; $i < @bins; $i++) {
next if not $bins[$i];
printf("%d, %d (min %d, max %d)\n",
$i * $step, ($i + 1) * $step,
min(@{$bins[$i]}), max(@{$bins[$i]}));
}
__DATA__
Pos Score
1 5
2 17
9 80
38 22
40 11
7 0
302 19
85 33
12 51
293 1
5 19
61 8
71 15
output
0, 30 (min 0, max 80)
30, 60 (min 11, max 22)
60, 90 (min 8, max 33)
270, 300 (min 1, max 1)
300, 330 (min 19, max 19)
Upvotes: 2
Reputation: 7912
If you push your data into an array, instead of a hash:
$inputarray[$Pos] = $Score;
You can use minmax
on an array slice (after stripping out any undefined values):
my ($min, $max) = minmax grep {defined} @inputarray[0..3];
e.g.
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw(minmax);
use List::Util qw(min);
my @inputarray;
<DATA>;
while (<DATA>) {
my ($pos, $score) = split;
$inputarray[$pos] = $score;
}
for (my $i = 1; $i < @inputarray; $i += 29) {
my $end = min($i + 29, $#inputarray); # Don't overrun the end of the array.
my ($min, $max) = minmax grep {defined} @inputarray[$i..$end];
print "$i-$end (min=$min,max=$max)\n" if defined $min;
}
__DATA__
Pos Score
1 5
2 17
9 80
38 22
40 11
7 0
302 19
85 33
12 51
293 1
5 19
61 8
71 15
Output:
1-30 (min=0,max=80)
30-59 (min=11,max=22)
59-88 (min=8,max=33)
291-302 (min=1,max=19)
Upvotes: 3