Reputation: 3941
So I have a set containing the endpoints of intervals. For example,
Set s = {(1,4),(3,7),(5,8),(14,17),(0,2),(11,14)}
I need a way to find out how many overlapping intervals are there. In the above case, the answer would be 5 since
(1,4) --> (3,7), (0,2)
(3,7) --> (5,8),(0,2)
(5,8) -->
(14,17) --> (11,14)
I need an algorithm that takes O(N log N)
time to find out the sum. Now if I sort all the starting points and apply the answer suggested here on every point Find number range intersection I get an O(N^2) solution. Any clue on what kind of data structure I might need in addition to a set? Thanks!
Upvotes: 2
Views: 5965
Reputation: 1
Paul's answer is really smart. I don't think I could come up with that idea if it is an interview. Here I have another version which is also O(nlog(n)).
import heapq
def countIntervals(s):
s.sort()
end = [s[0][1]]
res, cur = 0, 0
for l in s[1:]:
if l[0]>heapq.nsmallest(1, end)[0]:
heapq.heappop(end)
cur = len(end)
res += cur
end.append(l[1])
return res
We maintain a min heap that stores upcoming endpoints. Every time when a new interval comes in, we should compare its start point with the smallest end point so far.
If the start point is bigger, it means the smallest end point (represents that interval) will never cause more overlaps. Hence we pop it out.
If the start point is smaller, it means all the end points (corresponding intervals) are overlapping with the new coming interval.
Then the number of end points ("cur") is the number of overlaps that brings by this new coming interval. After we updated the result, we add the new coming interval's endpoint to the heap.
Upvotes: 0
Reputation: 58221
Build a list of values (a, -1), (b, 1) for every interval [a, b]. Now sorting these lets you run through the array, counting how many intervals are currently open at each endpoint, just by aggregating the +1 and -1s.
It's important that (b, 1) comes after (b, -1) in the sorted list; because we want to count the intersection even if the intersection is at a single point.
Complete code here.
def overlaps(intervals):
es = []
for a, b in intervals:
es.append((a, -1))
es.append((b, 1))
es.sort()
result = 0
n = 0
for a, s in es:
if s == -1: result += n
n -= s
return result
Note, this is trivially O(n log n), and doesn't need any fancy data-structures.
Upvotes: 4
Reputation: 131
This can simply be done using Greedy technique. just follow the following steps:
Sort the intervals based on the starting point Iterate from lowest starting point to highest point Count the number of previous endpoints larger or equal than current starting point. Increment the count of current end point.
Upvotes: 0
Reputation: 9107
First I assume from your example that (0,1)
and (1,2)
overlap.
Btw, your example is incorrect, (3,7)
does not overlap with (0,2)
One way to solve this is:
Step 1 can be done in O(n log n)
Step 2 is iterating over all the intervals while doing the count. So it's O(n * X)
where X
is the complexity to do the counting. With Fenwick Tree, we can do that in O(log n)
1, so step 2 can be completed in O(n log n)
also.
So the overall complexity is O(n log n)
.
Pseudocode:
def fenwick_tree.get(num):
return the sum from counter[0] to counter[num]
def fenwick_tree.add(num, val):
add one to counter[num]
intervals = [...]
sort(intervals) # using the starting point as the key
init_fenwick_tree()
sum = 0
count = 0
for (starting_point, end_point) in intervals:
sum = sum + (count - fenwick_tree.get(starting_point-1))
fenwick_tree.add(end_point,1)
return sum
Python code (only works when the interval points are non-negative integers):
MAX_VALUE = 2**20-1
f_arr = [0]*MAX_VALUE
def reset():
global f_arr, MAX_VALUE
f_arr[:] = [0]*MAX_VALUE
def update(idx,val):
global f_arr
while idx<MAX_VALUE:
f_arr[idx]+=val
idx += (idx & -idx)
def read(idx):
global f_arr
if idx <= 0:
return 0
result = 0
while idx > 0:
result += f_arr[idx]
idx -= (idx & -idx)
return result
intervals = [(1,4),(3,7),(5,8),(14,17),(0,2),(11,14)]
intervals = sorted(intervals, key=lambda x: x[0])
reset()
total = 0
for processed, interval in enumerate(intervals):
(start, end) = interval
total += processed - read(start-1)
update(end, 1)
print total
Will print 4
, which comes from these overlaps:
(0,2) - (1,4) (1,4) - (3,7) (3,7) - (5,8) (11,14) - (14,17)
Note that we can't get the overlapping intervals, since in the worst case there will be O(n^2)
overlaps, which cannot be printed in O(n log n)
time.
1Actually Fenwick tree does the counting in O(log M)
time, where M
is the largest possible number of distinct values in the interval. Note that M <= 2n
since there can only be that many distinct values. So it's also correct to say that Fenwick Tree does the counting in O(log n)
time.
Upvotes: 2
Reputation: 14025
Quick idea: First sort your intervals. Now go through your intervals, adding each to a min-heap ordered by endpoint. For each interval, remove everything from the heap that is smaller than that interval's start point. Every endpoint remaining in the heap represents an interval that starts before this interval and that overlaps it, so increment an overlaps
by the size of the heap. Now add the current interval to the heap and continue.
In pseudocode:
Sort(intervals)
(firstStart, firstEnd) = intervals[0]
currentIntervals = MinHeap()
overlaps = 0
for each (start, end) in intervals[1:]:
remove from currentIntervals all numbers < start
overlaps += Size(currentIntervals)
HeapPush(end, currentIntervals)
return overlaps
I haven't tested this, but it seems at least plausible.
Upvotes: 1