Reputation: 3941

Finding intervals of a set that are overlapping

So I have a set containing the endpoints of intervals. For example,

Set s = {(1,4),(3,7),(5,8),(14,17),(0,2),(11,14)}

I need a way to find out how many overlapping intervals are there. In the above case, the answer would be 5 since

(1,4) --> (3,7), (0,2)
(3,7) --> (5,8),(0,2)
(5,8) --> 
(14,17) --> (11,14)

I need an algorithm that takes O(N log N) time to find out the sum. Now if I sort all the starting points and apply the answer suggested here on every point Find number range intersection I get an O(N^2) solution. Any clue on what kind of data structure I might need in addition to a set? Thanks!

Upvotes: 2

Answers (5)

Leon

Reputation: 1

Paul's answer is really smart. I don't think I could come up with that idea if it is an interview. Here I have another version which is also O(nlog(n)).

import heapq

def countIntervals(s):
    s.sort()
    end = [s[0][1]]
    res, cur = 0, 0
    for l in s[1:]:
        if l[0]>heapq.nsmallest(1, end)[0]:
            heapq.heappop(end)
        cur = len(end)
        res += cur
        end.append(l[1])
    return res

We maintain a min heap that stores upcoming endpoints. Every time when a new interval comes in, we should compare its start point with the smallest end point so far.

If the start point is bigger, it means the smallest end point (represents that interval) will never cause more overlaps. Hence we pop it out.

If the start point is smaller, it means all the end points (corresponding intervals) are overlapping with the new coming interval.

Then the number of end points ("cur") is the number of overlaps that brings by this new coming interval. After we updated the result, we add the new coming interval's endpoint to the heap.

Upvotes: 0

Paul Hankin

Reputation: 58221

Build a list of values (a, -1), (b, 1) for every interval [a, b]. Now sorting these lets you run through the array, counting how many intervals are currently open at each endpoint, just by aggregating the +1 and -1s.

It's important that (b, 1) comes after (b, -1) in the sorted list; because we want to count the intersection even if the intersection is at a single point.

Complete code here.

def overlaps(intervals):
    es = []
    for a, b in intervals:
        es.append((a, -1))
        es.append((b, 1))
    es.sort()
    result = 0
    n = 0
    for a, s in es:
        if s == -1: result += n
        n -= s
    return result

Note, this is trivially O(n log n), and doesn't need any fancy data-structures.

Upvotes: 4

Shubham Chawla

Reputation: 131

This can simply be done using Greedy technique. just follow the following steps:

Sort the intervals based on the starting point Iterate from lowest starting point to highest point Count the number of previous endpoints larger or equal than current starting point. Increment the count of current end point.

Upvotes: 0

justhalf

Reputation: 9107

First I assume from your example that (0,1) and (1,2) overlap.

Btw, your example is incorrect, (3,7) does not overlap with (0,2)

One way to solve this is:

Sort the intervals based on the starting point
Iterate from lowest starting point to highest point
2a. Count the number of previous endpoints larger or equal than current starting point.
2b. Add one to the count of current end point.

Step 1 can be done in O(n log n)
Step 2 is iterating over all the intervals while doing the count. So it's O(n * X) where X is the complexity to do the counting. With Fenwick Tree, we can do that in O(log n)¹, so step 2 can be completed in O(n log n) also.

So the overall complexity is O(n log n).

Pseudocode:

def fenwick_tree.get(num):
    return the sum from counter[0] to counter[num]

def fenwick_tree.add(num, val):
    add one to counter[num]

intervals = [...]
sort(intervals) # using the starting point as the key
init_fenwick_tree()
sum = 0
count = 0
for (starting_point, end_point) in intervals:
    sum = sum + (count - fenwick_tree.get(starting_point-1))
    fenwick_tree.add(end_point,1)
return sum

Python code (only works when the interval points are non-negative integers):

MAX_VALUE = 2**20-1
f_arr = [0]*MAX_VALUE

def reset():
    global f_arr, MAX_VALUE
    f_arr[:] = [0]*MAX_VALUE

def update(idx,val):
    global f_arr
    while idx<MAX_VALUE:
        f_arr[idx]+=val
        idx += (idx & -idx)

def read(idx):
    global f_arr
    if idx <= 0:
        return 0
    result = 0
    while idx > 0:
        result += f_arr[idx]
        idx -= (idx & -idx)
    return result

intervals = [(1,4),(3,7),(5,8),(14,17),(0,2),(11,14)]
intervals = sorted(intervals, key=lambda x: x[0])
reset()
total = 0
for processed, interval in enumerate(intervals):
    (start, end) = interval
    total += processed - read(start-1)
    update(end, 1)
print total

Will print 4, which comes from these overlaps:

(0,2) - (1,4)
(1,4) - (3,7)
(3,7) - (5,8)
(11,14) - (14,17)

Note that we can't get the overlapping intervals, since in the worst case there will be O(n^2) overlaps, which cannot be printed in O(n log n) time.

¹_{Actually Fenwick tree does the counting in O(log M) time, where M is the largest possible number of distinct values in the interval. Note that M <= 2n since there can only be that many distinct values. So it's also correct to say that Fenwick Tree does the counting in O(log n) time.}

Upvotes: 2

jacobm

Reputation: 14025

Quick idea: First sort your intervals. Now go through your intervals, adding each to a min-heap ordered by endpoint. For each interval, remove everything from the heap that is smaller than that interval's start point. Every endpoint remaining in the heap represents an interval that starts before this interval and that overlaps it, so increment an overlaps by the size of the heap. Now add the current interval to the heap and continue.

In pseudocode:

Sort(intervals)
(firstStart, firstEnd) = intervals[0]
currentIntervals = MinHeap()
overlaps = 0

for each (start, end) in intervals[1:]:
  remove from currentIntervals all numbers < start
  overlaps += Size(currentIntervals)
  HeapPush(end, currentIntervals)
return overlaps

I haven't tested this, but it seems at least plausible.

Upvotes: 1

Finding intervals of a set that are overlapping

Answers (5)

Related Questions