Reputation: 3496
Let's say I have the following number ranges:
0-500 0-100 75-127 125-157 130-198 198-200
Now, let's say I need to be able to check any given number and see which ranges it is in. What sort of data structure would be used most efficiently to tell that, say, the number 100 belongs to the ranges 0-500, 0-100, and 75-127? Do I just want a binary tree holding the starting values? In that case, would it be feasible for each node in the tree to hold several objects containing each range at that starting point?
Note that I only need to retrieve for this particular application, I don't really see myself needing to modify it mid-process, so retrieval speed is by far my priority.
Thanks!
Upvotes: 9
Views: 2930
Reputation: 7769
Edited:
from
ex: {0 - #}x50 , {1 - #}x50 etc... for all from
ranges.Goal
Given 10,000 input range intervals (From to - To make the scenario worst, every 'from' is duplicated 50 times). The program takes one target value and displays all the ranges where the target belongs.
Example on how it works for 4 ranges:
Given the following ranges:
0 - 100
0 - 500
50 - 500
20 - 300
Target : 40
Output:
20 - 300
0 - 500
0 - 100
Explanation of the Algorithm (using the previous example):
Every from
value is mapped to an index++
starting from index = 0
. So, in our example:
From => index
0 => 0
50 => 1
20 => 2
Now having an array of Tree Set called pointers
, every index i
of this array refers to the key
of value i
in the Tree Set. So pointers[0]
refers to 'from': 0, pointers[1]
refers to 'from': 50, pointers[2]
refers to 'from': 20.
Now we add the to
values to each from
value respectively by looking at the tree map 'key' => 'value'
where value
is the index of key
in the pointers
array.
Moreover, we want the to
values added in the pointers
array at every index to be sorted in a descending order (Later I explain why).
So now the pointer becomes like this:
index => TreeSet[values..]
0 => 500 | 100
1 => 500
2 => 300
Now we are ready to get the ranges where the target belongs to.
For target = 40
1 - Search for the closest floor key of 40 in the tree map. The program finds that 20 is the closest one.
2 - It goes to the index corresponding to 20 in the pointers array. To get the index of 20: look at the tree map at key 20. Index of 20 is 2.
3 - So go now to pointers[2]
, it finds that there is the number 300.
4 - Now the application checks if 300 is less than target, the reason it does this, is because I mentioned before that every Tree Set created is sorted in a descending order. So if 300 is smaller than the target, therefore no need to continue checking for the next values in the pointers[2]
because it's guarantee that they are smaller.
5 - In this case, 300 is greater than target, then print the key as from
and pointer[2] {current element}
as to
.
6 - Since only one element found in pointers[2]
, the for loop exits, and the key is removed from the tree map, so next time when the program wants to find the next closest floor key, it will find the next one, if it exists.
7 - (Next iteration of while loop) The next key found, after removing key 20, is 0, with index 0 according to the tree map.
8 - Go to pointers[0]
. It finds that the number of elements in Tree Set at pointers[0]
is 2.
9 - Start with the first element pointers[0] {first element}
. Is 500 less than 40 ? No, print "range". Next element, is 100 less than 40 ? No, print "range". No more elements exit for loop.
10 - Remove key 0 from tree map.
11 - Now the While loop condition checks if there exist a closest floor key for the target. No, because 0 and 20 were removed. So condition not satisfied, exit while loop.
12 - Print time consumed, and end program :)
Hope you find the explanation useful.
Code
import java.util.Collections;
import java.util.TreeMap;
import java.util.TreeSet;
public class Interval {
public static void main(String[] args) {
int MAX_SIZE = 10000;
int[] from = new int[MAX_SIZE];
int[] to = new int[MAX_SIZE];
//Generate 10,000 (from - to), with 50 redundant (from range) for every value of from. (to make the application heavy)
int c = 0;
for(int i=0; i<MAX_SIZE;i++){
from[i] = 0+c;
to[i] = from[i] + (int)(Math.random()*100);
if(i%50 == 0)
c++;
}
//Start time counting
long time = System.currentTimeMillis();
int target = 97;
TreeMap<Integer, Integer> treePointer = new TreeMap<Integer, Integer>(); // will sotre <From, index++>
int index = 0;
int size = from.length;
TreeSet<Integer>[] pointers = new TreeSet[size]; //Array of tree set to store values of every "from" range
//insert into tree
for(int i=0; i<from.length;i++){
if(!treePointer.containsKey(from[i])){ //if the "from" does not exist in the tree yet, insert it
treePointer.put(from[i], index);
pointers[index++] = new TreeSet<Integer>(Collections.reverseOrder()); //sort descending order
}
//index of "from" in the pointers array
int virtualIndex = treePointer.get(from[i]);
//add the 'to' element to the corresponding index of "from" in Tree Set at pointers[index of "from"]
pointers[virtualIndex].add(to[i]);
}
// Display part of the pointers array to understand how elements are stored
// for(int i=0; i<10; i++){
// for(int current : pointers[i]){
// System.out.print(current + " ");
// }
// System.out.println();
// }
//Start checking for the ranges
Integer currentKey = -1; //dummy number at first
while((currentKey = treePointer.floorKey(target)) != null){ // while there is a closest floor key
//get index assigned to the closest floor key
int virtualIndex = treePointer.get(currentKey);
//loop on the elements found at pointers[index of closest floor number]
for(int toValue : pointers[virtualIndex]){
if(toValue < target) //remember the values are sorted in a descending order, so whenever the value becomes smaller than the target don't continue the for loop
break;
System.out.println(currentKey + " - " + toValue); // else target less or equal to toValue, so print range
}
treePointer.remove(currentKey); //remove key from tree to fetch the next floor key in the next while iteration
}
//Display time consumed in ms
System.out.println("\nTotal time: " + (System.currentTimeMillis() - time) + " ms");
}
}
Upvotes: 1
Reputation: 4772
Assuming you have enough memory, I would use an array of lists, where the list is of all ranges containing the index.
Here is a Python solution showing the algorithm in more detail:
# (Inclusive) ranges
ranges = [(0,500), (0,100), (75,127), (125,157), (130,198), (198,200)]
smallest = min(r[0] for r in ranges)
largest = max(r[1] for r in ranges)
# Ceate table
table = [[] for i in range(smallest, largest+1)] # List of lists
for r in ranges: # pre-compute results
mn, mx = r
for index in range(mn, mx+1):
table[index - smallest].append(r)
def check(n):
'Return list of ranges containing n'
if smallest <= n <= largest:
return table[n - smallest]
else:
return [] # Out of range
for n in [-10, 10, 75, 127, 129, 130, 158, 197, 198, 199, 500, 501]:
print('%3i is in groups: %r' % (n, check(n)))
The output is:
-10 is in groups: []
10 is in groups: [(0, 500), (0, 100)]
75 is in groups: [(0, 500), (0, 100), (75, 127)]
127 is in groups: [(0, 500), (75, 127), (125, 157)]
129 is in groups: [(0, 500), (125, 157)]
130 is in groups: [(0, 500), (125, 157), (130, 198)]
158 is in groups: [(0, 500), (130, 198)]
197 is in groups: [(0, 500), (130, 198)]
198 is in groups: [(0, 500), (130, 198), (198, 200)]
199 is in groups: [(0, 500), (198, 200)]
500 is in groups: [(0, 500)]
501 is in groups: []
Further optimisations such as using a bit-set to store what ranges are at each index instead of a list are possible.
From the comments below, I decided to modify the above to choose between the table lookup method above and a slower but vastly more memory efficient solution based on direct comparison. (Ideally there would be an interval tree based solution included too):
# (Inclusive) ranges
ranges = [(0,500), (0,100), (75,127), (125,157), (130,198), (198,200)]
limit = 1000000 # Or whatever
smallest = min(r[0] for r in ranges)
largest = max(r[1] for r in ranges)
if (largest - smallest) * len(ranges) < limit:
# Ceate table
table = [[] for i in range(smallest, largest+1)] # List of lists
for r in ranges:
mn, mx = r
for index in range(mn, mx+1):
table[index - smallest].append(r)
def check(n):
'Return list of ranges containing n'
if smallest <= n <= largest:
return table[n - smallest]
else:
return [] # Out of range
else:
# mpre emory efficient method, for example
def check(n):
return [(mn, mx) for mn, mx in ranges if mn <= n <= mx]
for n in [-10, 10, 75, 127, 129, 130, 158, 197, 198, 199, 500, 501]:
print('%3i is in groups: %r' % (n, check(n)))
Upvotes: 1
Reputation: 6881
What you need is interval tree. Interval trees are quite general concept and keep a slightly different data in nodes for every problem. In your case each node would keep a list of input intervals that cover the interval node is representing.
Upvotes: 6
Reputation: 3200
Let R
denote the number of ranges possible. (For your example R=6
)
Create R
hashtables such that each hash table can hold only one range. For your example you need to create 6 hashtables. The first hash table R1
will contain values from 0-500 only.
Populate hashtable.
Each number will go into appropriate hashtables. For your example number 100
will go into R1
, R2
, R3
. If R
is large, then you would need to create lot of hashtables. But then the total space is bounded by the actual data stored in all the hashtables put together.
Retrival:
For any given number check if it is present in each of the R
hashtables. You can further optimize by choosing which hashtables to look. For example for 100
you only need to look into 3 out of 6 hashtables.
Time complexity:
Searching for a single value in a hash table takes constant time
(on average). So
Amortized O(1)
to see if a number is present in a hashtable.
Amortized O(R)
to produce the output as we need to look at all the hashtables to produce the output
Upvotes: 2