user3099140
user3099140

Reputation:

Positioning an ordered sequence of intervals for maximum alignment with another sequence of fixed intervals

I have two sequences of intervals.

The first is fixed and non-overlapping, so something like:

[1..10], [12..15], [23..56], [72..89], ...

The second is not fixed, so it's just an ordered list of interval lengths:

[7, 2, 5, 26, ...]

The task at hand is to:

Very simple example:

[25..26], [58..68], [74..76], [78..86]

[10, 12]

The optimal solution is to place the interval of length 10 at [58..68] and the interval of length 12 at [74..86] which results in only the numbers 25, 26, and 77 being in one list but not the other.

The only thing I've come up with that seems mildly helpful is that if I lay down the intervals in order, I know how many 'penalties' the interval I've already created, so I have an upper bound for the score, which means I have an admissible heuristic and I can do A* search instead of looking at the entire tree. However, the total range of numbers spans from 0 to about 34M, so I'd like something better.

Any help would be hot!

Upvotes: 6

Views: 449

Answers (1)

David Knipe
David Knipe

Reputation: 3454

OK, here's a half-thought-out answer. It should work in polynomial time, but I haven't bothered checking what the index is. It may well be possible to get a better index than the answer as outlined here. The details are left as an exercise to the reader :-) I hope it's not too unclear.

I'll define the score of a solution as the number of integers which appear in both lists of intervals. Let f(i,m) be the highest score it's possible to get using just the first i interval lengths, subject to the condition that none of your intervals goes above m. The function f, for fixed i, is essentially a (non-strictly) increasing function from the integers to a bounded subset of the integers. Therefore:

  • all values of f(i,m), for m > 0, are equal, with finitely many exceptions;
  • all values of f(i,m), for m < 0, are equal, with finitely many exceptions.

This means it's possible to represent all values of f(i,m) using a finite data structure (still considering a fixed value of i).

Now let F(i) be the value of this data structure representing all values of f(i,m). I claim that, given F(i), it is possible to calculate F(i+1). To do this, we only need to answer the following question for all x: If I place the new interval at x, how good is the best solution I can get? But we know what this is - it's just f(i,x) + the score we've got from this interval.

So if n is the number of intervals in the second list, the score of the best solution will be F(n).

To actually find the solution, you could work backwards from this.

You know what's the best score you can get. Say it's s_0. Then put the last interval as far left as possible, subject to the condition that it allows you to score s_0. That is, find the smallest m such that f(n,m) = s_0; and place the interval such that it only just stays inside the bound at m.

Then, let s_1 be the score you need to get from all the other intervals in order to get a total of s_0. Place the next-last interval as far left as possible, subject to the condition that you can still score s_1. That is, find the smallest m such that f(n,m) = s_1; and place the interval such that it only just stays inside the bound at m.

And so on...

Upvotes: 0

Related Questions