SSh
SSh

Reputation: 189

Perl fast checking of overlapping intervals?

I am trying to find overlapping of intervals. I have an interval 1000 to 5000 (just given one for example). This is checked in intervals given below. This script does work but is very slow with thousands of intervals to check. Is there any way to make this faster? Thanks

#!/usr/bin/perl
use warnings;
use strict;
use v5.16;
use List::MoreUtils qw/ any /;

my $start = 1000;
my $end   = 5000;

while ( my $line = <DATA> ) {
    chomp $line;
    my @element = split "\t", $line;
    my @checking_array = "";
    for my $checking_no ( $element[0] .. $element[1] ) {
        push @checking_array, $checking_no;
    }
    for my $value ( $start .. $end ) {
        if ( any { $_ eq $value } @checking_array ) {
            print "$start to $end found in $line\n";
            last;
        }
        else { next }
    }
}

__DATA__
780895  781139
3707570 3707794
13753925    13754168
2409582 2409790
6360880 6361084
8261045 8261250
4133539 4133772
7731897 7732188
8660252 8660539
12156253    12156504
9136875 9137168
16657849    16658107
5000    6000
4133539 4133772
7731897 7732188
8660252 8660539
4999    10000
12156253    12156504
3707570 3707794
13753925    13754168
2409582 2409790
6360880 6361084

Outputs:

1000 to 5000 found in 5000  6000
1000 to 5000 found in 4999  10000

Upvotes: 3

Views: 690

Answers (2)

ikegami
ikegami

Reputation: 385734

You never need the numbers in between the bounds!!!! Just check against the bounds.

         S---------E
 L-----H                      No overlap
      L-----H                 Overlap
           L-----H            Overlap
                L-----H       Overlap
                     L----H   No overlap
      L---------------H       Overlap

So they overlap except when H<S or L>E.

while ( my $line = <DATA> ) {
    chomp $line;
    my ($lo, $hi) = split "\t", $line;
    if ( $lo <= $end && $hi >= $start ) {
        print "$start to $end found in $line\n";
    }
}

Upvotes: 6

Borodin
Borodin

Reputation: 126722

There's no need to check every value between $start and $end; you can simply compare the limits of the two ranges. I think this code is fairly straightforward

#!/usr/bin/perl

use strict;
use warnings 'all';

my $start = 1000;
my $end   = 5000;

while ( my $line = <DATA> ) {

    my ($low, $high) = split ' ', $line;

    unless ( $high < $start or $low > $end ) {
        chomp $line;
        print qq{$start to $end found in "$line"\n};
    }
}

__DATA__
780895  781139
3707570 3707794
13753925    13754168
2409582 2409790
6360880 6361084
8261045 8261250
4133539 4133772
7731897 7732188
8660252 8660539
12156253    12156504
9136875 9137168
16657849    16658107
5000    6000
4133539 4133772
7731897 7732188
8660252 8660539
4999    10000
12156253    12156504
3707570 3707794
13753925    13754168
2409582 2409790
6360880 6361084

output

1000 to 5000 found in "5000    6000"
1000 to 5000 found in "4999    10000"

Upvotes: 3

Related Questions