Whole Brain
Whole Brain

Reputation: 2167

Is this method to get the closest number in a sorted List most effective?

I have large arrays of integers (with sizes between 10'000 and 1'400'000). I want to get the first integer bigger to a value. The value is never inside the array.

I've looked for various solutions but I have only found :

  1. methods that estimate each values and are not designed for sorted lists or arrays (with O(n) time complexity).
  2. methods that are recursive and/or not designed for very large lists or arrays (with O(n) or more time complexity, in other languages though, so I'm not sure).

I've designed my own method. Here it is :

int findClosestBiggerInt(int value, int[] sortedArray) {
    if( sortedArray[0]>value ||
            value>sortedArray[sortedArray.length-1] )   // for my application's convenience only. It could also return the last.
        return sortedArray[0];

    int exp = (int) (Math.log(sortedArray.length)/Math.log(2)),
        index = (int) Math.pow(2,exp);
    boolean dir; // true = ascend, false = descend.
    while(exp>=0){
        dir = sortedArray[Math.min(index, sortedArray.length-1)]<value;
        exp--;
        index = (int)( index+ (dir ? 1 : -1 )*Math.pow(2,exp) );
    }

    int answer = sortedArray[index];
    return answer > value ? answer : sortedArray[index+1];
}

It has a O(log n) time complexity. With an array of length 1'400'000, it will loop 21 times inside the while block. Still, I'm not sure that it cannot be improved.

Is there a more effective way to do it, without the help of external packages ? Any time saved is great because this calculation occurs quite frequently.

Upvotes: 1

Views: 580

Answers (3)

ruakh
ruakh

Reputation: 183251

As Gene's answer indicates, you can do this with binary search. The built-in java.util.Arrays class provides a binarySearch method to do that for you:

int findClosestBiggerInt(final int value, final int[] sortedArray) {
    final int index = Arrays.binarySearch(sortedArray, value + 1);
    if (index >= 0) {
        return sortedArray[index];
    } else {
        return sortedArray[-(index + 1)];
    }
}

You'll find that to be much faster than the method you wrote; it's still O(log n) time, but the constant factors will be much lower, because it doesn't perform expensive operations like Math.log and Math.pow.

Upvotes: 1

Gene
Gene

Reputation: 46960

Binary search is easily modified to do what you want.

Standard binary search for exact match with the target maintains a [lo,hi] bracket of integers where the target value - if it exists - is always inside. Each step makes the bracket smaller. If the bracket ever gets to size zero (hi < lo), the element is not in the array.

For this new problem, the invariant is exactly the same except for the definition of the target value. We must take care never to shrink the bracket in a way that might eliminate the next bigger element.

Here's the "standard" binary search:

int search(int tgt, int [] a) {
  int lo = 0, hi = a.length - 1;
  // loop while the bracket is non-empty
  while  (lo <= hi) {
    int mid = lo + (hi - lo) / 2;
    // if a[mid] is below the target, ignore it and everything smaller
    if (a[mid] < tgt) lo = mid + 1;
    // if a[mid] is above the target, ignore it and everything bigger
    else if (a[mid] > tgt) hi = mid - 1;
    // else we've hit the target
    else return mid;
  }
  // The bracket is empty. Return "nothing."
  return -1;
}

In our new case, the part that obviously needs a change is:

    // if a[mid] is above the target, ignore it and everything bigger
    else if (a[mid] > tgt) hi = mid - 1;

That's because a[mid] might be the answer. We can't eliminate it from the bracket. The obvious thing to try is keep a[mid] around:

    // if a[mid] is above the target, ignore everything bigger
    else if (a[mid] > tgt) hi = mid;

But now we've introduced a new problem in one case. If lo == hi, i.e. the bracket has shrunk to 1 element, this if doesn't make progress! It sets hi = mid = lo + (hi - lo) / 2 = lo. The size of the bracket remains 1. The loop never terminates.

Therefore, the other adjustment we need is to the loop condition: stop when the bracket reaches size 1 or less:

  // loop while the bracket has more than 1 element.
  while  (lo < hi) {

For a bracket of size 2 or more, lo + (hi - lo) / 2 is always smaller than hi. Setting hi = mid makes progress.

The last modification we need is checking the bracket after the loop terminates. There are now three cases rather than one in the original algorithm:

  1. empty or
  2. contains one element, which is the answer,
  3. or it's not.

It's easy to sort these out just before returning. In all, we have:

int search(int tgt, int [] a) {
  int lo = 0, hi = a.length - 1;
  while  (lo < hi) {
    int mid = lo + (hi - lo) / 2;
    if (a[mid] < tgt) lo = mid + 1;
    else if (a[mid] > tgt) hi = mid;
    else return mid;
  } 
  return lo > hi || a[lo] < tgt ? -1 : lo;
}

As you point out, for a 1.4 million element array, this loop will execute no more than 21 times. My C compiler produces 28 instructions for the whole thing; the loop is 14. 21 iterations ought to be a handful of microseconds. It requires only small constant space and generates zero work for the Java garbage collector. Hard to see how you'll do better.

Upvotes: 1

WJS
WJS

Reputation: 40034

Is there a more effective way to do it, without the help of external packages ? Any time saved is great because this calculation occurs quite frequently.

Well here is one approach that uses a map instead of an array.

      int categorizer = 10_000;
      // Assume this is your array of ints.
      int[] arrayOfInts = r.ints(4_000, 10_000, 1_400_000).toArray();

You can group them in a map like so.

       Map<Integer, List<Integer>> ranges =
            Arrays.stream(arrayOfInts).sorted().boxed().collect(
                  Collectors.groupingBy(n -> n / categorizer));

Now, when you want to find the next element higher, you can get the list that would contain the number.

Say you want the next number larger than 982,828

      int target = 982,828;
      List<Integer> list = map.get(target/categorizer); // gets the list at key = 98

Now just process the list with your favorite method. One note. In some circumstances it is possible that your highest number could be in the other lists that come after this one, depending on the gap. You would need to account for this, perhaps by adjusting how the numbers are categorized or by searching subsequent lists. But this can greatly reduce the size of the lists you're working with.

Upvotes: 2

Related Questions