Reputation: 11916

Possible Interview Question: How to Find All Overlapping Intervals

It's not an interview question per se, as I came across this in my project, but I figured it could be a decent intervew question.

You have N pairs of intervals, say integers. You're required to indentify all intervals that overlap with each other in O(N) time. For example, if you have

{1, 3} {12, 14} {2, 4} {13, 15} {5, 10}

the answer is {1, 3}, {12, 14}, {2, 4}, {13, 15}. Note that you don't need to group them, so the result can be in any order like in the example.

I just threw in O(N) time because the KMP algorithm takes O(N) for string search. :D

The best I came up with and what I'm using right now in the project is O(N^2). Yeah, brute force is pretty sad, but no one complains so I won't refactor it. :P Still, I was curious if a greater mind has a more elegant solution.

Upvotes: 79

Answers (10)

gcbenison

Reputation: 11963

Sort the intervals by start point. Then fold over this list, merging each interval with its neighbor (i.e. (1,4),(2,6) -> (1,6)) if they overlap. The resulting list contains those intervals with no overlapping partner. Filter these out of the original list.

This is linear in time after the initial sort operation which could be done with any (n log n) algorithm. Not sure how you'd get around that. Even the 'filter out duplicates' operation at the end is linear if you take advantage of the already-sorted order of the input and output of the first operation.

Here is an implementation in Haskell:

-- Given a list of intervals, select those which overlap with at least one other inteval in the set.
import Data.List

type Interval = (Integer, Integer)

overlap (a1,b1)(a2,b2) | b1 < a2 = False
                       | b2 < a1 = False
                       | otherwise = True
                                     
mergeIntervals (a1,b1)(a2,b2) = (min a1 a2, max b1 b2)
                                     
sortIntervals::[Interval]->[Interval]
sortIntervals = sortBy (\(a1,b1)(a2,b2)->(compare a1 a2))

sortedDifference::[Interval]->[Interval]->[Interval]
sortedDifference [] _ = []
sortedDifference x [] = x
sortedDifference (x:xs)(y:ys) | x == y = sortedDifference xs ys
                              | x < y  = x:(sortedDifference xs (y:ys))
                              | y < x  = sortedDifference (x:xs) ys

groupIntervals::[Interval]->[Interval]
groupIntervals = foldr couldCombine []
  where couldCombine next [] = [next]
        couldCombine next (x:xs) | overlap next x = (mergeIntervals x next):xs
                                 | otherwise = next:x:xs
                                               
findOverlapped::[Interval]->[Interval]
findOverlapped intervals = sortedDifference sorted (groupIntervals sorted)
  where sorted = sortIntervals intervals
                                     
sample = [(1,3),(12,14),(2,4),(13,15),(5,10)]

Upvotes: 34

stackoverflowuser2010

Reputation: 41019

Here's an O(N lg N) implementation in Java that extends the answer provided by @Nikita Rybak.

My solution finds every interval that overlaps with at least one other interval and counts them both as overlapping intervals. For example, the two intervals (1, 3) and (2, 4) from OP's original question overlap each other, and so in this case there are 2 overlapping intervals. In other words, if interval A overlaps with interval B, then I add both A and B to the resulting set of intervals that overlap.

Now consider the intervals (1, 100), (10, 20) and (30, 50). My code will find that:

[ 10,  20] overlaps with [  1, 100]
[ 30,  50] overlaps with [  1, 100]

Resulting intervals that overlap with at least one other interval:
[  1, 100]
[ 30,  50]
[ 10,  20]

In order to prevent (1, 100) from being counted twice, I use a Java Set that keeps only unique Interval objects.

My solution follows this outline.

Sort all intervals by starting point. This step is O(N lg N).
Keep track of intervalWithLatestEnd, the interval with the latest end point.
Iterate over all the intervals in the sorted list. If an interval overlaps with intervalWithLatestEnd, then add both to a Set. Update intervalWithLatestEnd when needed. This step is O(N).
Return the Set (and convert to a List if needed).

The total running time is O(N lg N). It requires an output Set of size O(N).

Implementation

In order to add intervals to a set, I created a custom Interval class that override equals(), as expected.

class Interval {
    int start;
    int end;
    Interval(int s, int e) { 
        start = s; end = e; 
    }

    @Override
    public String toString() {
        return String.format("[%3d, %3d]", start, end);
    }
    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + start;
        result = prime * result + end;
        return result;
    }
    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        final Interval other = (Interval) obj;
        if (start != other.start)
            return false;
        if (end != other.end)
            return false;
        return true;
    }
}

And here is the code that runs the algorithm:

private static List<Interval> findIntervalsThatOverlap(List<Interval> intervals) {

    // Keeps unique intervals.
    Set<Interval> set = new HashSet<Interval>();

    // Sort the intervals by starting time.
    Collections.sort(intervals, (x, y) -> Integer.compare(x.start, y.start));

    // Keep track of the interval that has the latest end time.
    Interval intervalWithLatestEnd = null;

    for (Interval interval : intervals) {

        if (intervalWithLatestEnd != null &&
            interval.start < intervalWithLatestEnd.end) {

            // Overlap occurred.
            // Add the current interval and the interval it overlapped with.
            set.add(interval); 
            set.add(intervalWithLatestEnd);

            System.out.println(interval + " overlaps with " +
                               intervalWithLatestEnd);
        }

        // Update the interval with latest end.
        if (intervalWithLatestEnd == null ||
            intervalWithLatestEnd.end < interval.end) {

            intervalWithLatestEnd = interval;
        }
    }
    // Convert the Set to a List.
    return new ArrayList<Interval>(set);
}

Test cases

Here is a test case that runs the OP's original intervals:

public static void testcase() {

    List<Interval> intervals = null;
    List<Interval> result = null;

    intervals = new ArrayList<Interval>();

    intervals.add(new Interval(1, 3));
    intervals.add(new Interval(12, 14));
    intervals.add(new Interval(2, 4));
    intervals.add(new Interval(13, 15));
    intervals.add(new Interval(5, 10));


    result = findIntervalsThatOverlap(intervals);
    System.out.println("Intervals that overlap with at least one other interval:");
    for (Interval interval : result) {
        System.out.println(interval);
    }
}

with the result:

[  2,   4] overlaps with [  1,   3]
[ 13,  15] overlaps with [ 12,  14]
Intervals that overlap with at least one other interval:
[  2,   4]
[  1,   3]
[ 13,  15]
[ 12,  14]

Finally, here is a more advanced test case:

public static void testcase() {

    List<Interval> intervals = null;
    List<Interval> result = null;

    intervals = new ArrayList<Interval>();

    intervals.add(new Interval(1, 4));
    intervals.add(new Interval(2, 3));
    intervals.add(new Interval(5, 7));
    intervals.add(new Interval(10, 20));
    intervals.add(new Interval(15, 22));
    intervals.add(new Interval(9, 11));
    intervals.add(new Interval(8, 25));
    intervals.add(new Interval(50, 100));
    intervals.add(new Interval(60, 70));
    intervals.add(new Interval(80, 90));


    result = findIntervalsThatOverlap(intervals);
    System.out.println("Intervals that overlap with at least one other interval:");
    for (Interval interval : result) {
        System.out.println(interval);
    }
}

with the result:

[  2,   3] overlaps with [  1,   4]
[  9,  11] overlaps with [  8,  25]
[ 10,  20] overlaps with [  8,  25]
[ 15,  22] overlaps with [  8,  25]
[ 60,  70] overlaps with [ 50, 100]
[ 80,  90] overlaps with [ 50, 100]
Intervals that overlap with at least one other interval:
[  2,   3]
[  8,  25]
[  9,  11]
[ 50, 100]
[  1,   4]
[ 15,  22]
[ 10,  20]
[ 60,  70]
[ 80,  90]

Upvotes: 2

moinudin

Reputation: 138487

Throw the endpoints of the intervals into an array, marking them as either start- or end-points. Sort them by breaking ties by placing end-points before start-points if the intervals are closed, or the other way around if they're half-open.

1S, 2S, 3E, 4E, 5S, 10E, 12S, 13S, 14E, 15E

Then iterate through the list, keeping track of how many intervals we're in (this equates to number of start-points processed minus number of end-points). Whenever we hit a start-point while we are already in an interval, this means we must have overlapping intervals.

1S, 2S, 3E, 4E, 5S, 10E, 12S, 13S, 14E, 15E
    ^                          ^
   overlap                    overlap

We can find which intervals overlap with which by storing data alongside the end-points, and keeping track of which intervals we're in.

This is an O(N logN) solution, with sorting being the main factor.

Upvotes: 106

Tea Tagger

Reputation: 17

This problem can be reduced to the element uniqueness problem.

Element uniqueness has Omega(n log n) lower bound (counting number of comparisons), so you can't do better than that.

Upvotes: 0

will

Reputation: 45

If N pairs of intervals is integers, then we can get it in O(n).

Sort it by first number in the pair then the second number. If all are integers, we can use bucket sort or radix sort to get it by O(n).

{1, 3}, {2,4}, {5, 10}, {12, 14}, {13, 15}

Then combine one by one,

{1,3}

{1,4} with overlap {1,3} and {2,4}

{1,4}, {5,10}

{1,4}, {5,10}, {12,14}

{1,4}, {5,10}, {12,15} with overlap {12,14} and {13,15}

The combination would take O(N) time

Upvotes: 2

Jan Newmarch

Reputation: 21

Suppose the difference between start and endpoints is small, say < 32. eg. 1..32. Then each interval can be written as a bit pattern in a 32 bit word. e.g [1, 2] -> 001; [2, 3]-> 010; [1, 3] -> 011; [2, 3, 4] -> 110. Two intervals, or combinations of intervals, overlap if their bitwise AND is non-zero. eg. [1,2] overlaps [1,3] because 001&011 == 001, non-zero. O(n) alg is to keep a running bitwise OR of intervals seen so far and AND each new one:

bitsSoFar = 0
for (n=0; n < bitPatterns.length; n++)
    if (bitPatterns[n] & bitsSoFar != 0)
        // overlap of bitPatterns[n] with an earlier pattern
        bitsSoFar |= bitPatterns[n]

Left as an exercise:

modify algorithm to also identify overlap of a bit pattern with a later one
work out the bit pattern for an interval in O(1)

Upvotes: 2

jbx

Reputation: 22178

Not sure about O(N) but what if we first sort them by the first number in each tuple, and then sequentially find those where the first number of the tuple is greater than that of the largest number seen in previous tuples, which also do not overlap with the next tuple.

So you would first get:

{1, 3}, {2,4}, {5, 10}, {12, 14}, {13, 15}

since 4 (largest) < 5 and 10 < 12, {5, 10} is isolated.

This would entail that we keep track of the largest number we encounter, and each time we find a tuple whose starting number is greater we check if it overlaps with the next.

This becomes dependent on the efficiency of the sorting algorithm then, because the latter process would be O(N)

Upvotes: 5

Brett Stottlemyer

Reputation: 2852

It's been quite a while since I've used it, but the solution I used was an derivative of the red-black tree described in Introduction to Algorithms called an interval tree. It is a tree sorted by interval start, so you can quickly (binary search) first the first eligible node. IIRC, the nodes were ordered by a property that let's you stop "walking" the tree when the candidates nodes can't match your interval. So I think it was O(m) search, where m is the number of matching intervals.

I search found this implementation.

Brett

[edit] Rereading the question, this isn't what you asked. I think this is the best implementation when you have a list of (for instance) meetings already scheduled in conference rooms (which are added to the tree) and you want to find which rooms are still available for a meeting with a new start and duration (the search term). Hopefully this solution has some relevance, though.

Upvotes: 0

Nikita Rybak

Reputation: 68036

The standard approach for intervales-on-the-line problems is to sort them according to starting point and then just walk from first to last. O(n*logn) (O(n) if already sorted)

end = 0;
for (current in intervals) {
    if current.start < end {
        // there's an intersection!
        // 'current' intersects with some interval before it
        ...
    }
    end = max(end, current.end)
}

Upvotes: 10

escargot agile

Reputation: 22389

You can go over the list once and keep a hash table of all the intervals encountered so far. If an input interval is part of some interval from the hash table, merge it into the hash table interval. Mark the non-primitive intervals (merged intervals that consist of more than one interval).

Then you go over the list a second time, and for each interval check in the hash table whether it's contained in a merged interval or not.

I don't know if it's O(N), but it's much better than O(N^2).

Upvotes: -1

Possible Interview Question: How to Find All Overlapping Intervals

Answers (10)

Implementation

Test cases

Related Questions