Daniel Standage
Daniel Standage

Reputation: 8304

Determine non-overlapping locations with Perl

I have a collection of locations--here is an example of the data structure.

my $locations =
{
  loc_1 =>
  {
    start => 1,
    end   => 193,
  },
  loc_2 =>
  {
    start => 180,
    end   => 407,
  },
  loc_3 =>
  {
    start => 329,
    end   => 684,
  },
  loc_4 =>
  {
    start => 651,
    end   => 720,
  },
};

What is the best way to determine every possible combination of non-overlapping locations? The answer for this example would look something like this. Keep in mind there may be one or more locations, and these locations may or may not overlap.

my $non_overlapping_locations =
[
  {
    loc_1 =>
    {
      start => 1,
      end   => 193,
    },
    loc_3 =>
    {
      start => 329,
      end   => 684,
    },
  },
  {
    loc_1 =>
    {
      start => 1,
      end   => 193,
    },
    loc_4 =>
    {
      start => 651,
      end   => 720,
    },
  },
  {
    loc_2 =>
    {
      start => 180,
      end   => 407,
    },
    loc_4 =>
    {
      start => 651,
      end   => 720,
    },
  }
];

Update: ysth's response helped me see a flaw in my wording. I guess I'm not interested in //every possible// combination of non-overlapping locations, I'm only interested in the solutions that are not subsets of other solutions.

Upvotes: 2

Views: 493

Answers (4)

Dancrumb
Dancrumb

Reputation: 27539

I'm not a CS guy, so I'm not down with all of the best algorithms, but I wonder if there's a better approach than:

my @location_keys = keys %{$locations};
while (my $key_for_checking = (shift @location_keys) {
    foreach my $key_to_compare (@location_keys) {
        if ( do_not_overlap($locations->{$key_for_checking}, 
                            $locations->{$key_to_compare} ) {
            add_to_output($key_for_checking, $key_to_compare);
        }
    }
}

With do_not_overlap and add_to_output suitable defined.

If you're wondering about checking for overlap... that's pretty straightforward. A and B do not overlap if:

( (A->start < B->start) && (A->end < B->start) ) ||
( (A->start > B->end)   && (A->end > B->end) )

You may need to tweak depending on whether a shared boundary constitutes an overlap. Also, you can simplify this if you know whether A and B are sorted in some way (either by start or by end)

Upvotes: 1

Mihai Toader
Mihai Toader

Reputation: 12243

First i would gather all the individual points (the start and end of each location), sort them and keep them in a list. In your case that would be:

1,180,193,329,407,651,684,720. 

For each interval in that list find out how many segments are overlapping it. In your case this would be:

1, 180 -> 1
180, 193 -> 2
193, 329 -> 1
329, 407 -> 2
407, 651 -> 1
651, 684 -> 2
684, 720 -> 1

and loop at what segments have more than 1 (in this case there are 3). So the total number of cases is 2 x 2 x 2 = 8 solutions (you can pick only one segment maching a multi interval in a solution).

we found 2, 2, 2 (or 2, 3, 4). Keep them in an array and start from the last. Decrement it until you reach 0. When you reach 1 decrement the previous number and set the first number at the initial value minus 1.

Let's assume we have numbered the initial segments: (in this case 1,2,3,4,5,6). The overlapping segments will have the following segments in them [1,2], [2,3], [3,4]. So we have 3 overlapping segments. Now we start a recursive process of choice/elimination: At each step we are looking at an overlapping segment that has multiple segments. We iterate the choices and for each choice we do two things: eliminate from each subsequent overlapping segment the segments we didn't choose now, and force the current segment choice in each subsequent overlapping segments that have this choice as a possibility. Every segment that becomes a non overlapping will be treated as a new choice. Search for next multiple choice and recurse. Once we can't find a choice we have partial solution. We need to add to it the segments that aren't involved in any overlapping. Print it.

In this case it would go like this: First step:

we are here [1,2], [2,3], [3,4]:
  chose 1 -> // eliminate 2 from rest and force 1 (3 is a single choice so we do the same)
      [1], [3], [3] -> [1, 3] solution 
  chose 2 -> // eliminate 1 from the rest and force 2 (2 single choice so we do the same). 
      [2], [2], [4] -> [2, 4] solution

This should work properly.

Now the code implementing this (it's not the prettiest perl code around i assume but i'm really not a perl guy):

#!/bin/perl

use strict;
use warnings;
use 5.010;
use Data::Dumper;

my $locs = {
  loc_1 => {
    start => 1,
    end   => 193,
  },
  loc_2 => {
    start => 180,
    end   => 407,
  },
  loc_3 => {
    start => 329,
    end   => 684,
  },
  loc_4 => {
            start => 651,
    end   => 720,
  }
};

my (%starts, %ends);
map {
        my ($start, $end) = ($locs->{$_}->{start}, $locs->{$_}->{end});

        push @{ $starts{$start} }, $_;
        push @{ $ends{$end} }, $_;
} keys %$locs;

my @overlaps, my %tmp;

map {
        map { $tmp{$_} = 1 } @{$starts{$_}};
        map { delete $tmp{$_} } @{$ends{$_}};

        my @segs = keys %tmp;
        push @overlaps, \@segs if 1 < @segs
} sort (keys %starts, keys %ends);

sub parse_non_overlapping {
  my ($array,$pos)=($_[0], $_[1]);

  my @node = @{$array->[$pos]};
  foreach my $value ( @node ) {

    my @work = map { [@$_] } @$array;
    $work[$pos] = [ $value ];

    my ($removed, $forced) = ( {}, {$value => 1});
    map { $removed->{$_} = 1 if $_ ne $value } @node;

    my ($i, $new_pos) = (0, -1);
    for ( $i = $pos + 1; $i <= $#work; $i++ ) {
        $_ = $work[$i];

        #apply map
        @$_ = grep { not defined($removed->{$_}) } @$_;
        if ( $#$_ == 0 ) { $forced->{@$_[0]} = 1 }

        #apply force
            my @tmp = grep { defined $forced->{$_} } @$_;
        if ( $#tmp == 0 ) {
             map { $removed->{$_} = 1 if $tmp[0] ne $_ } @$_;
             @$_ = @tmp;
        }

        if ( $#$_ > 0 && $new_pos == -1 ) {
                $new_pos = $i;
        }

        $work[$i] = $_;
    }

    if ( $new_pos != -1 ) {
      parse_non_overlapping(\@work, $new_pos);
    } else {
      print Dumper \@work
       # @work has the partial solution minux completely non overlapping segments.
    }
  }
}    

parse_non_overlapping(\@overlaps, 0);

Upvotes: 1

ysth
ysth

Reputation: 98398

use strict;
use warnings;

my $locations =
{
  loc_1 =>
  {
    start => 1,
    end   => 193,
  },
  loc_2 =>
  {
    start => 180,
    end   => 407,
  },
  loc_3 =>
  {
    start => 329,
    end   => 684,
  },
  loc_4 =>
  {
    start => 651,
    end   => 720,
  },
};
my $non_overlapping_locations = [];
my @locations = sort keys %$locations;

get_location_combinations( $locations, $non_overlapping_locations, [], @locations );

use Data::Dumper;
print Data::Dumper::Dumper($non_overlapping_locations);

sub get_location_combinations {
    my ($locations, $results, $current, @remaining) = @_;

    if ( ! @remaining ) {
        if ( not_a_subset_combination( $results, $current ) ) {
            push @$results, $current;
        }
    }
    else {
        my $next = shift @remaining;
        if (can_add_location( $locations, $current, $next )) {
            get_location_combinations( $locations, $results, [ @$current, $next ], @remaining );
        }
        get_location_combinations( $locations, $results, [ @$current ], @remaining );
    }
}

sub can_add_location {
    my ($locations, $current, $candidate) = @_;

    # not clear if == is an overlap; modify to use >=  and <= if so.
    0 == grep $locations->{$candidate}{end} > $locations->{$_}{start} && $locations->{$candidate}{start} < $locations->{$_}{end}, @$current;
}

sub not_a_subset_combination {
    my ($combinations, $candidate) = @_;

    for my $existing (@$combinations) {
        my %candidate;
        @candidate{@$candidate} = ();
        delete @candidate{@$existing};
        if ( 0 == keys %candidate ) {
            return 0;
        }
    }
    return 1;
}

A relatively simple optimization would be to sort @locations by start and then end and pre-calculate and store in a hash (or just in $locations->{foo}) for each location how many of the following locations conflict with that location. Then in the can_add... case, splice that number off of @remaining before recursing.

Or pre-calculate for each location a hash of all following locations that conflict and strip them all out with a grep before recursing. (Though with that approach, having remaining be a hash begins to make more sense.)

Update: another approach to a solution would be to build up a tree of locations to exclude, where leaves represent solutions and inner nodes represent combinations that still have conflicts; the top node is all locations, and each node has children that represent removing one of the remaining conflicting locations that's greater (in some arbitrary ordering scheme) than the location removed by the parent node (if any).

Upvotes: 1

Hugmeir
Hugmeir

Reputation: 1259

(Real life intrudes - Apologies, I'll write an explanation - and get ride of those empty arrayrefs, although that's fairly trivial - later!)

#! /usr/bin/perl
use strict;
use warnings;
use 5.010;
use List::MoreUtils qw(any);
use Data::Dumper;

my $locations = {
    loc_1 => {
        start => 1,
        end   => 193,
    },
    loc_2 => {
        start => 180,
        end   => 407,
    },
    loc_3 => {
        start => 329,
        end   => 684,
    },
    loc_4 => {
        start => 651,
        end   => 720,
    },
};

my @keys = keys %$locations;

my %final;

for my $key (@keys) {
    push @{ $final{$key} }, map {
        if (   $locations->{$key}->{start} >= $locations->{$_}->{start}
            && $locations->{$key}->{start} <= $locations->{$_}->{end}
            or $locations->{$key}->{end} >= $locations->{$_}->{start}
            && $locations->{$key}->{end} <= $locations->{$_}->{end} )
        {
            ();
        }
        else {
            my $return = [ sort $key, $_ ];
            if ( any { $return ~~ $_ } @{ $final{$_} }, @{ $final{$key} } ) {
                ();
            }
            else { $return; }
        }
    } grep { $_ ne $key } keys %$locations;
}

say Dumper \%final;

Upvotes: 0

Related Questions