Nico J.
Nico J.

Reputation: 317

Finding pair values in a given range

I have an array or N pairs (v1, v2) where v1 <= v2. These are supposed to represent events in time that start at v1 and end at v2. they can be equal, then the event is instantaneous. The array is sorted by starting time, v1.

For a given range (L, R), I would like to find any pair where L <= v1 <= R or L <= v2 <= R. The idea here is to get events starting, happening or ending in the given range.

My main problem is efficiency. The array could contains hundreds of thousands of events. So just a linear search going through all pairs is not an option.

I read a bit about kd-tree but the problem with it is that it excludes the boundaries of the range and would only return L <= v1 <= R AND L <= v2 <= R. That is, would only return events that actually happen (start AND end) in the range whereas I need start OR end (or both obviously).

I thought also about keeping 2 lookup tables (I use double for time)

std::map<double, Event*> startPoints;
std::map<double, Event*> endPoints;

and the use the std::find algorithm in both of them and merge the results.

Just looking for an advise, wether it's a good solution or if there is a more clever way.

EDIT:

Re-thinking about that, It is more complicated. Here is an example of the expected results

|---Ev1---|     |---Ev3---|     |---Ev5---|
        |---Ev2---|     |---Ev4---|
             |               |
             L               R

Here I would like to get Ev2 (which is ending in the range), Ev3 (Which is happening in the range and Ev4 (which is starting in the rage)

|---Ev1---|     |---Ev3---|     |---Ev5---|
        |---Ev2---|     |---Ev4---|
                    |    |
                    L    R

Here I would like to get Ev3 as it is currently running in the range and Ev4 as it is starting in the range

|---Ev1---|     |---Ev3---|     |---Ev5---|
        |---Ev2---|     |---Ev4---|
             |
             LR

Here I would like only Ev2 as it is the only one currently running.

Upvotes: 5

Views: 567

Answers (3)

Yola
Yola

Reputation: 19041

As you need to handle three cases - starting, happening or ending in the given range, we can split it into three parts.

  1. starting: v1 lies in [L,R].
  2. ending: v2 lies in [L,R].

The third case can be formulated as v1 <= R and L <= v2, but the first two cases partially cover this case, so we will use different formulation to avoid collisions:

  1. happening: v1 < L and R < v2

Well, it is easy to handle the first case in logarithmic plus number of reported events time if we can sort the array of events by v1. The same trick works for the second case.

The third case is trickier. Let's draw:

enter image description here

The pink area represents all intervals L <= R. The red dot is an interval and greenish area represents all possible events we want to capture. To do such a capture one can use k2-tree.

Upvotes: 3

darune
darune

Reputation: 11000

Using an indexed approach is fine - such as Boost.ICL solution.

That being said you could easily use a std::vector for this - even unsorted - I think for as long as you are within the range of some 100.000 or even 1.000.000 you should be fine (as long as you store actual values - not pointers in the vector as that can be slow) - exact numbers will of course depend on your thressholds.

struct MyEvent {
  double v1;//you use double for time
  double v2;
};


std::vector<MyEvent> events;

Here is an example using 1.000.000 elements:

http://coliru.stacked-crooked.com/a/9a6d90348f6915e1

and the searching runs in 42 ms which consists of one compare and optional copy while your case may be a bit different it is comparable.

Going further, you could get more power by parallelizing your search in some way using eg. std::for_each.

Upvotes: 1

Narek Aydinyan
Narek Aydinyan

Reputation: 335

std::map -->finding element complexity is O(logn) If your keys are unique and you don't have a memory problem, you can use std::unordered_map which complexity is amortized (O1). Also, you don't need to create 2 maps. std::unordered_map<double, std::pair<Event*, Event*>> StartEndPoints;. If your keys don't unique you can use std::unordered_multimap, but if your keys will be repeated a lot, the finding complexity could become (On). I'll suggest not to pass key type as a double.

std::hash<double> hashing.
auto temp = hashing(key). // decltype of temp will be size_t
std::unordered_map<std::size_t, std::pair<Event*, Event*>> StartEndPoints;

Upvotes: -1

Related Questions