user2674514
user2674514

Reputation: 141

Perl for loop for multiple ranges

What is the best way to setup a range counter in a for loop? I have a tab-delim input file where the first 2 columns are important. I would like to find the min and max values of scores where they occur within a range of Pos values. So for the sample input file:

Pos     Score
1       5
2       17
9       80
38      22
40      11
7       0
302     19
85      33
12      51
293     1
5       19
61      8
71      15

I need to calculate the min and max scores for each range, if they exist.

1-29 (min=?, max=?)
30-59 (min=?, max=?)
60-89 (min=?, max=?)

Expected results:

1-29 (min=0, max=80)
30-59 (min=11, max=22)
60-89 (min=8, max=33)
290-219 (min=1, max=19)

There was another thread related to this but they are only counting occurrences with a set range. My attempt was to setup a for loop:

use List::MoreUtils qw( minmax );
my %inputhash;
my %storehash;

open (FF,$inputfile) || die "Cannot open file $inputfile";
while(<FF>) {
    next if $. < 2; #use to trim off first line if there is a header
    my ($Pos, $Score)  = split;
    $inputhash{$Pos} = $Score;
}


for (my $x=1; $x<1600; $x+29) #set to 1600 for now
{
    my $low = $x;
    my $high = $x+29;
    foreach my $i ($low...$high)
    {
        if (exists $inputhash{$i})
        {
            my $score = $inputhash{$i};
            push (@{$storehash{$high}}, $score);
        }
    }
} 

foreach my $range (sort {$a <=> $b} keys %storehash)
{
    my ($minrange, $maxrange) = minmax @{$storehash{$range}};
    print "$range: $minrange, $maxrange\n";
}

Is there a better way to handle this? This current implementation gives me an error: Useless use of addition (+) in void context.

Upvotes: 3

Views: 1345

Answers (4)

mpapec
mpapec

Reputation: 50657

Using command line,

perl -ane'
  /\d/ or next;
  $i = int($F[0] /30);
  (!defined or $_ >$F[1]) and $_ = $F[1] for $r[$i]{m};
  (!defined or $_ <$F[1]) and $_ = $F[1] for $r[$i]{M};
  }{
  printf("%d-%d (min=%d, max=%d)\n", $_*30, $_*30+29, $r[$_]{m}, $r[$_]{M})
    for grep $r[$_], 0 .. $#r;
' file

output

0-29 (min=0, max=80)
30-59 (min=11, max=22)
60-89 (min=8, max=33)
270-299 (min=1, max=1)
300-329 (min=19, max=19)

Script equivalent of command line version,

my @r;
while (<>) {
  /\d/ or next;
  my @F = split;
  my $i = int($F[0] /30);
  # min topicalizer, refer to $r[$i]{m} as $_
  for ($r[$i]{m}) {
    $_ = $F[1] if !defined or $_ >$F[1];
  }
  # max topicalizer
  for ($r[$i]{M}) {
    $_ = $F[1] if !defined or $_ <$F[1];
  }
}

for (grep $r[$_], 0 .. $#r) {
  printf("%d-%d (min=%d, max=%d)\n", $_*30, $_*30+29, $r[$_]{m}, $r[$_]{M});
}

Upvotes: 1

Borodin
Borodin

Reputation: 126742

The error message

Useless use of addition (+) in void context

should have alerted you to the last clause of your for loop being $x+29 instead of $x += 29. Apart from that you have simple boundary errors on the ranges

If your range widths are all the same size, then the easiest way is to calculate the range for each position by simple division and build a list of scores for each range. The minimum and maximum in each range can be determined afterwards

This solution uses a constant WIDTH to determine the size of each range; in this case it is 30

use strict;
use warnings;
use autodie;

use List::MoreUtils 'minmax';
use constant WIDTH => 30;

<>; # lose the header

my @buckets;
while (<>) {
  my ($pos, $score) = split;
  push @{ $buckets[$pos / WIDTH] }, $score;
}

for my $i (0 .. $#buckets) {
  next unless my $contents = $buckets[$i];
  my $start = $i * WIDTH;
  printf "%d-%d (min=%d, max=%d)\n",
      $start, $start + WIDTH - 1,
      minmax @$contents;
}

output

0-29 (min=0, max=80)
30-59 (min=11, max=22)
60-89 (min=8, max=33)
270-299 (min=1, max=1)
300-329 (min=19, max=19)

Upvotes: 1

perreal
perreal

Reputation: 98078

use strict;
use warnings;

use List::Util qw(max min);

my $step = 30;  # group into 30 item ...
my @bins;       # ... bins

<DATA>;         # skip line
while (<DATA>) {
  my ($p, $s) = split;
  push @{$bins[$p / $step]}, $s; 
}

for (my $i = 0; $i < @bins; $i++) {
    next if not $bins[$i];
    printf("%d, %d  (min %d, max %d)\n", 
        $i * $step, ($i + 1) * $step, 
        min(@{$bins[$i]}), max(@{$bins[$i]}));
}

__DATA__
Pos     Score
1       5
2       17
9       80
38      22
40      11
7       0
302     19
85      33
12      51
293     1
5       19
61      8
71      15

output

0, 30  (min 0, max 80)
30, 60  (min 11, max 22)
60, 90  (min 8, max 33)
270, 300  (min 1, max 1)
300, 330  (min 19, max 19)

Upvotes: 2

RobEarl
RobEarl

Reputation: 7912

If you push your data into an array, instead of a hash:

$inputarray[$Pos] = $Score;

You can use minmax on an array slice (after stripping out any undefined values):

my ($min, $max) = minmax grep {defined} @inputarray[0..3];

e.g.

#!/usr/bin/perl
use strict;
use warnings;

use List::MoreUtils qw(minmax);
use List::Util qw(min);

my @inputarray;
<DATA>;
while (<DATA>) {
    my ($pos, $score) = split;
    $inputarray[$pos] = $score;
}

for (my $i = 1; $i < @inputarray; $i += 29) {
    my $end = min($i + 29, $#inputarray); # Don't overrun the end of the array.
    my ($min, $max) = minmax grep {defined} @inputarray[$i..$end];
    print "$i-$end (min=$min,max=$max)\n" if defined $min;
}

__DATA__
Pos     Score
1       5
2       17
9       80
38      22
40      11
7       0
302     19
85      33
12      51
293     1
5       19
61      8
71      15

Output:

1-30 (min=0,max=80)
30-59 (min=11,max=22)
59-88 (min=8,max=33)
291-302 (min=1,max=19)

Upvotes: 3

Related Questions