Reputation: 20039
I have a set of integer ranges, which represent lower and upper bounds of classes. For example:
0..500 xsmall
500..1000 small
1000..1500 medium
1500..2500 large
In my case there can be over 500 classes. These classes do not overlap, but they can differ in size.
I can implement finding the matching range as a simple linear search through a list, for example
class Range
{
int lower;
int upper;
String category;
boolean contains(int val)
{
return lower <= val && val < upper;
}
}
public String getMatchingCategory(int val)
{
for (Range r : listOfRanges)
{
if (r.contains(val))
{
return r.category;
}
}
return null;
}
However, this seems slow; as I need on average N/2 look-ups. If the classes were equally sized, I could use division. Is there a standard technique to find the correct range faster?
Upvotes: 6
Views: 2110
Reputation: 200168
What you are looking for is a SortedMap
and its methods tailMap
and firstKey
. Check out the documentation for full details.
The advantage of this approach over plain arrays is in the ease of maintaining your ranges: you can insert/remove new boundaries at any point with almost no runtime cost; with arrays it means copying both parallel arrays in full.
I've written code for both variants and benchmarked it:
@State(Scope.Thread)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class BinarySearch
{
static final int ARRAY_SIZE = 128, INCREMENT = 1000;
static final int[] arrayK = new int[ARRAY_SIZE];
static final String[] arrayV = new String[ARRAY_SIZE];
static final SortedMap<Integer,String> map = new TreeMap<>();
static {
for (int i = 0, j = 0; i < arrayK.length; i++) {
arrayK[i] = j; arrayV[i] = String.valueOf(j);
map.put(j, String.valueOf(j));
j += INCREMENT;
}
}
final Random rnd = new Random();
int rndInt;
@Setup(Level.Invocation) public void nextInt() {
rndInt = rnd.nextInt((ARRAY_SIZE-1)*INCREMENT);
}
@GenerateMicroBenchmark
public String array() {
final int i = Arrays.binarySearch(arrayK, rndInt);
return arrayV[i >= 0? i : -(i+1)];
}
@GenerateMicroBenchmark
public String sortedMap() {
return map.tailMap(rndInt).values().iterator().next();
}
}
Benchmark results:
Benchmark Mode Thr Cnt Sec Mean Mean error Units
array thrpt 1 5 5 10.948 0.033 ops/usec
sortedMap thrpt 1 5 5 5.752 0.070 ops/usec
Interpretation: array search is only twice as fast and this factor is quite stable across array sizes. In the presented code the array size is 1024 and the factor is 1.9. I've also tested with array size 128, where the factor is 2.05.
Upvotes: 4
Reputation: 46432
Here, Arrays.binarySearch
is your friend. Simply put all the boundaries in and handle the possible cases. Assuming you ranges leave no holes between them, you only need to put the upper bounds in.
For you example
0..500 xsmall
500..1000 small
1000..1500 medium
1500..2500 large
you'd use
int[] boundaries = {500, 1000, 1500, 2500};
and look up the input. Handle the two cases (found/not found) and you're done. Forget about ranges, they're nice but they don't fit you problem.
I also wrote a benchmark and no matter how I try I'd lose my bet as the ratio is about 3 rather than 5. The strange things like S001024
in my results stand for the size 1024.
Upvotes: 1