poultrynews
poultrynews

Reputation: 659

Fast intersection of several interval ranges?

I have several variables, all of which are numeric ranges: (intervals in rows)

a = [ 1 4; 5 9; 11 15; 20 30];
b = [ 2 6; 12 14; 19 22];
c = [ 15 22; 24 29; 33 35];
d = [ 0 3; 15 17; 23 26];

(The values in my real dataset are not integers, but are represented here as such for clarity).

I would like to find intervals in which at least 3 of the variables intersect. In the above example [20 22] and [24 26] would be two such cases.

One way to approach this is to bin my values and add the bins together, but as my values are continuous, that'd create an 'edge effect', and I'd lose time binning the values in the first place. (binning my dataset at my desired resolution would create hundreds of GB of data).

Another approach, which doesn't involve binning, would use pairwise intersections (let's call it X) between all possible combinations of variables, and then the intersection of X with all other variables O(n^3).

What are your thoughts on this? Are there algorithms/libraries which have tools to solve this?

I was thinking of using sort of a geometric approach to solve this: Basically, if I considered that my intervals were segments in 1D space, then my desired output would be points where three segments (from three variables) intersect. I'm not sure if this is efficient algorithmically though. Advice?

Upvotes: 2

Views: 668

Answers (1)

Ben Voigt
Ben Voigt

Reputation: 283803

O(N lg N) method:

  1. Convert each interval (t_A, t_B) to a pair of tagged endpoints ('begin', t_A), ('end', t_B)

  2. Sort all the endpoints by time, this is the most expensive step

  3. Do one pass through, tracking nesting depth (increment if tag is 'start', decrement if tag is 'end'). This takes linear time.

    • When the depth changes from 2 to 3, it's the start of an output interval.
    • When it changes from 3 to 2, it's the end of an interval.

Upvotes: 9

Related Questions