Shikari
Shikari

Reputation: 67

Extract the unique intervals from two arrays in perl?

I am trying to extract non-overlapping intervals from two files with intervals (those that are unique). Here the case:

file1.txt

Start End
1 3
5 9
13 24
34 57

file2.txt

Start End
6 7
10 12
16 28
45 68

Expected result: an array having those intervals with elements present only in one file:

1-3 , 10-12

That's all... thank you very much in advance!

Upvotes: 2

Views: 99

Answers (2)

Borodin
Borodin

Reputation: 126722

This program does as you ask. It loads all the ranges into @pairs (there is no need to distinguish between the contents of file1 and file2) and copies that list into the array @unique. Every possible combination of two ranges is then tested to see if they overlap, and both ranges are deleted from @unique if so.

The remaining contents of @unique is the list of ranges that is required. I have displayed it using both Data::Dump, in case you need to process the result further, and using print, so that you can see that the output matches the required result in your question.

use strict;
use warnings;

our @ARGV = qw/ file1.txt file2.txt /;

my @ranges;

while (<>) {
  my @pair = /\d+/g;
  next unless @pair == 2;
  push @ranges, \@pair;
}

my @unique = @ranges;

for my $i (0 .. $#unique) {
  for my $j ($i+1 .. $#unique) {
    if ($unique[$i][0] <= $unique[$j][1] and $unique[$i][1] >= $unique[$j][0]) {
      ++$unique[$_][2] for $i, $j;
    }
  }
}

@unique = grep { not $_->[2] } @unique;


use Data::Dump;
dd \@unique;

print join(', ', map join('-', @$_), @unique), "\n";

output

[[1, 3], [10, 12]]
1-3, 10-12

Update

Using the data from @Choroba (thanks) the output is now

[[1, 3], [1000, 1001], [10, 12]]
1-3, 1000-1001, 10-12

which I think is correct.

Upvotes: 1

choroba
choroba

Reputation: 241858

Process the files line by line. If there's no overlap, report the interval that starts earlier and advance its file. In case of an overlap, advance both files.

#!/usr/bin/perl
use warnings;
use strict;

use Data::Dumper;

my @F;
open $F[0], '<', 'file1.txt' or die $!;
open $F[1], '<', 'file2.txt' or die $!;

# Skip headers.
readline $_ for @F;

my @boundaries;
my @results;

sub earlier {
    my ($x, $y) = @_;
    if (! @{ $boundaries[$y] }
        or $boundaries[$x][1] < $boundaries[$y][0]
    ) {
        push @results, $boundaries[$x];
        $boundaries[$x] = [ split ' ', readline $F[$x] ];
        return 1
    }
    return 0
}

sub overlap {
    my ($x, $y) = @_;
    if ($boundaries[$x][1] < $boundaries[$y][1]) {
        do { $boundaries[$x] = [ split ' ', readline $F[$x] ] }
          until ! @{ $boundaries[$x] }
          or $boundaries[$x][0] > $boundaries[$y][1];
        $boundaries[$y] = [ split ' ', readline $F[$y] ];
        return 1
    }
    return 0
}

sub advance_both {
    @boundaries = map [ split ' ', readline $_ ], @F;
}

# init.
advance_both();
while (grep defined, @{ $boundaries[0] }, @{ $boundaries[1] }) {

    earlier(0, 1)
    or earlier(1, 0)
    or overlap(0, 1)
    or overlap(1, 0)
    or advance_both();
}

print join(' , ', map { join '-', @$_ } @results), "\n";

Upvotes: 3

Related Questions