Greedy Attempt for covering all the numbers with the given intervals

Question

Let S be a set of intervals (containing n number of intervals) of the natural numbers that might overlap and N be a list of numbers (containing n number of numbers).

I want to find the smallest subset (let's call P) of S such that for each number in our list N, there exists at least one interval in P that contains it. The intervals in P are allowed to overlap.

Trivial example:

S = {[1..4], [2..7], [3..5], [8..15], [9..13]}
N = [1, 4, 5]
// so P = {[1..4], [2..7]}

I think a dynamic algorithm might not work always, so if anybody knows of a solution to this problem (or a similar one that can be converted into), that would be great. I am trying to make a O(n^2 solution)

Here is one greedy approach

P = {}
for each q in N: // O(n)
    if q in P // O(n)
        continue
    for each i in S // O(n)
        if q in I: // O(n)
           P.add(i)
           break

But that is O(n^4).. Any help with creating a greedy approach that is O(n^2) would be great!

Thanks!

* Update: * I've been slamming at this problem and I think I have an O(n^2) solution!!

Let me know if you think I'm right!!!

N = MergeSort (N)
upper, lower = infinity, -1
P = empty set
for each q in N do
     if (q>=lower and q<=upper)=False
          max_interval = [-infinity, infinity]
          for each r in S do
              if q in r then
                 if r.rightEndPoint > max_interval.rightEndPoint
                     max_interval = r
          P.append(max_interval)
          lower = max_interval.leftEndPoint
          upper = max_interval.rightEndPoint
          S.remove(max_interval)

I think this should work!! I'm trying to find a counter solution; but yeah!!

heorhi · Accepted Answer

This problem is similar to set cover problem, which is NP-complete (i.e., arguably has no solution faster than exponential). What makes it different is that intervals always cover adjacent elements (not arbitrary subset of N), which opens ways for faster solutions.

http://en.wikipedia.org/wiki/Set_cover_problem

I think that the solution proposed by Mike is good enough. But I think I have quite straightforward O(N^2) greedy algo. It starts like the Mike's one (moreover, I believe Mike's solution can also be improved in similar way):

You sort your N numbers and place them sorted into array ELEM; COMPLEXITY O(N*lg N);
Using binary search, for each interval S[i] you identify starting and ending index of elements in ELEM that are covered by S[i]. Say, you place this pair of numbers into array COVER, the difference between the two indices tells you how many elements you cover, for simplicity, let us place it array COVER_COUNT; COMPLEXITY O(N*lg N);
You introduce index pointer p, that shows till which element in ELEM, your N is already covered. you set p = 0, meaning that all elements up to 0-th (excluded) are initially covered (i.e., no elements); Complexity O(1). Moreover you introduce boolean array IS_INCLUDED, that reflects if interval S[i] is already included in your coverage set. Complexity O(N)
Then you start from the 0-th element in ELEM and see what is the interval that contains ELEM[0] and has greater coverage COVER_COUNT[i]. Imagine that it is i-th interval. We then mark it as included by setting IS_INCLUDED[i] to true. Then you set p to end[i] + 1 where end[i] is the ending index in COVER[i] pair (indeed now all elements til end[i] are covered). Then, knowing p you update all elements in COVER_COUNT so that they reflect how many elements of not yet covered elements each interval covers (this can be easily done in O(N) time). Then you perform the same step for ELEM[p] and continues till p >= ELEM.length. It can be observed that the overall complexity is O(N^2).

You finish in O(n^2) and in IS_INCLUDED has true for intervals of S included in optimal cover set

Let me know if this solution seems reasonable to you and if I calculated everything well.

P.S. Just wanted to add that the optimality of ythe solution found by algo can be proved by induction and contradiction. By contradiction, it is easy to show that at least one optimal solution includes the longest interval of those covering element ELEM[0]. If so, by induction we can show that for each next element in algo, we can keep on following the strategy of selelcting the interval that is the longest with respect to the number of remaining elements covered and that covers the leftmost yet uncovered element.

Greedy Attempt for covering all the numbers with the given intervals

Answers (2)

Related Questions