shole
shole

Reputation: 4094

What is the fastest algorithm to determine if any number in a sorted array is multiple of `x`?

Given an positive integer x and a sorted positive integer array A

Is there any algorithm faster than O(N) to determine if any element in A is a multiple of x? There are no negative elements in A.

Naive looping A once is my only idea so far, I do not know if there is any way to make use of the fact that A is sorted to speed it up.

Upvotes: 17

Views: 3494

Answers (6)

Tatarize
Tatarize

Reputation: 10806

HT @ tobias_k's comment.

You can solve it in ~ O(n/x) (update this might actually be O(N*log(N)/X^2)). You do a binary search for all multiples of x simultaneously. Where you subdivide each search space each iteration and when the search space cannot contain a multiple of x, you abort that branch. So rather than binary search each value, you binary search all values but only for those branches that still contain a valid multiple within their range. The best thing about this is it utterly prevents researching the same space twice which makes the worst case x=1 or O(n/1) O(n). In the best case it will know that the range cannot contain a multiple and abort in O(1).

Since you are guaranteed a worse case of O(n) wherein you basically miss every damned cache lookup (keep in mind in real world terms this might end up being more important than the time complexity, hence test such things). You would get theoretical time complexity that could be better than O(n) but never worse than that (save the jumping around an array will miss caches because that's how computers end up actually working in the real world thing).

As predicted the speed increase depends a lot on the value of k (x).

This starts going faster than the raw loop at k = ~128. (divide factor)

Truncating the branches manages to real world surpass raw loop. I'm assuming the n count isn't going to matter much as it seemed to scale about the same, but perhaps directly checking that is better.

Note: by the nature of this code it will skip doubles, which is the difference in the counts.

public class MultipleSearch {

    public static void main(String[] args) {
        Random random = new Random();
        int[] array = new int[500000000];
        for (int i = 0, m = array.length; i < m; i++) {
            array[i] = Math.abs(random.nextInt());
        }
        Arrays.sort(array);
        for (int k = 1; k < 16777216; k *= 2) {
            long time;
            time = System.currentTimeMillis();
            binaryFactorLocator(array, k);
            System.out.println("Factors found multi: " + (System.currentTimeMillis() - time) + " @" + k);
            time = System.currentTimeMillis();
            loopFactorLocator(array, k);
            System.out.println("Factors found loop: " + (System.currentTimeMillis() - time) + " @" + k);
        }
    }

    public static void loopFactorLocator(int[] array, int k) {
        int count = 0;
        for (int i = 0, m = array.length; i < m; i++) {
            if (array[i] % k == 0) {
                count++;
                //System.out.println("loop: " + array[i] + " value at index " + i + " is a proven multiple of " + k);
            }
        }
        System.out.println(count + " items found.");
    }

    public static void binaryFactorLocator(int[] array, int k) {
        int count = binaryFactorLocator(0, array, k, 0, array.length);
        System.out.println(count + " items found.");
    }

    public static int binaryFactorLocator(int count, int[] array, int k, int start, int end) {
        if (start >= end) { //contains zero elements. (End is exclusive)
            return count;
        }
        int startValue = array[start]; //first value
        int endValue = array[end - 1]; //last value;
        if (startValue / k == endValue / k) { //if both values are contained within the same factor loop.
            if (startValue % k == 0) { //check lower value for being perfect factor.
                //System.out.println("multi-binary: " + startValue + " value at index " + start + " is a proven multiple of " + k);
                return count + 1;
            }
            return count; //There can be no other factors within this branch.
        }
        int midpoint = (start + end) / 2; //subdivide
        count = binaryFactorLocator(count, array, k, start, midpoint); //recurse.
        count = binaryFactorLocator(count, array, k, midpoint, end); //recurse.
        return count;
    }
}

This implementation should be pretty solid, since it truncates the loop within the start/k == end/k elements it should skip double (sometimes, it might cleave between two doubled values). Obviously recursive like this is likely not going to be optimal and should be perhaps rewritten with a less callstack stack.

474682772 items found.
Factors found multi: 21368 @1
500000000 items found.
Factors found loop: 5653 @1
236879556 items found.
Factors found multi: 21573 @2
250000111 items found.
Factors found loop: 7782 @2
118113043 items found.
Factors found multi: 19785 @4
125000120 items found.
Factors found loop: 5445 @4
58890737 items found.
Factors found multi: 16539 @8
62500081 items found.
Factors found loop: 5277 @8
29399912 items found.
Factors found multi: 12812 @16
31250060 items found.
Factors found loop: 5117 @16
14695209 items found.
Factors found multi: 8799 @32
15625029 items found.
Factors found loop: 4935 @32
7347206 items found.
Factors found multi: 5886 @64
7812362 items found.
Factors found loop: 4815 @64
3673884 items found.
Factors found multi: 3441 @128
3906093 items found.
Factors found loop: 4479 @128
1836857 items found.
Factors found multi: 2100 @256
1953038 items found.
Factors found loop: 4592 @256
918444 items found.
Factors found multi: 1335 @512
976522 items found.
Factors found loop: 4361 @512
459141 items found.
Factors found multi: 959 @1024
488190 items found.
Factors found loop: 4447 @1024
229495 items found.
Factors found multi: 531 @2048
243961 items found.
Factors found loop: 4114 @2048
114715 items found.
Factors found multi: 295 @4096
121964 items found.
Factors found loop: 3894 @4096
57341 items found.
Factors found multi: 195 @8192
61023 items found.
Factors found loop: 4061 @8192
28554 items found.
Factors found multi: 106 @16384
30380 items found.
Factors found loop: 3757 @16384
14282 items found.
Factors found multi: 65 @32768
15207 items found.
Factors found loop: 3597 @32768
7131 items found.
Factors found multi: 35 @65536
7575 items found.
Factors found loop: 3288 @65536
3678 items found.
Factors found multi: 17 @131072
3883 items found.
Factors found loop: 3281 @131072
1796 items found.
Factors found multi: 13 @262144
1900 items found.
Factors found loop: 3243 @262144
873 items found.
Factors found multi: 6 @524288
921 items found.
Factors found loop: 2970 @524288
430 items found.
Factors found multi: 3 @1048576
456 items found.
Factors found loop: 2871 @1048576
227 items found.
Factors found multi: 2 @2097152
238 items found.
Factors found loop: 2748 @2097152
114 items found.
Factors found multi: 1 @4194304
120 items found.
Factors found loop: 2598 @4194304
48 items found.
Factors found multi: 0 @8388608
51 items found.
Factors found loop: 2368 @8388608

Upvotes: 14

Zen
Zen

Reputation: 119

In a previous attempt, I tried a simple dichotomic search but, as was pointed out, that doesn't actually lead anywhere.

Here's my best attempt. I doubt it's worth the hassle, and it might even be slower for real world cases, but here goes.

If you have an array A[0..n] of sorted, positive integers, and you want to check if there is a multiple of a positive integer X in A[i..j], where 0≤i

If i>j then A[i..j] ist empty and thus contains no multiple of X
Else if A[i] % X = 0 or A[j] % X = 0 then A[i..j] contains a multiple of X
Else if A[i] / X = A[j] / X then A[i..j] contains no multiple of X
Else A[i..j] contains a multiple of X iff A[i+1..(i+j)/2] or A[(i+j)/2+1..j-1] contains a multiple of X

My guess is that the complexity would be O(n/X) ish, so, not really an improvement in the grand scheme of things.

Edited to add: If your data is really peculiar, this might actually help. In many cases, it might actually hurt:

  • there’s the overhead of managing the return stack (the first recursive call is non-terminal)
  • because we skip back and forth through the data instead of traversing it, we wreck cache locality
  • we make branch prediction more difficult for the processor

Upvotes: 6

catastrophic-failure
catastrophic-failure

Reputation: 3905

Subtract x from A iteratively. Stop if any result is equal to 0.

Regarding the subtractions, you do it 1 + {max(A) DIV x} times at the worst scenario this way, in that you have to subtract x from the maximum element of A after all others have failed the check already, and once more (hence the 1) to the maximum element itself, which also fails the check, like 7 DIV 3 = 2, so in three iterations:

  1. 7 - 3 = 4
  2. 4 - 3 = 1
  3. 1 - 3 = -2 , != 0 therefore it's not divisible

This still qualifies as O(n), but ends being fast, doing integer subtraction on an array.

Upvotes: 0

Santanu Lahiri
Santanu Lahiri

Reputation: 1

Two things to try - first find the first element in the array greater than x. No need to search the bottom half. A binary search can do this. That will reduce the time in some cases, but if the very first element is greater than x, then no need obviously. Then, having identified the section with possible multiples of x, do a binary search if running a single thread, or if you can run multiple threads, split the upper half into segments, and have a separate thread search each. I think this may be the most efficient way, but subject to the following caveats.

  1. If the first element greater than x is fairly low in the array, you are going to spend more time setting up the binary search than in linear scanning.

  2. There is a cost to doing binary searches too. If you array is not very big, you are going to be less efficient with that than linear searches. I am not talking order of the algorithm. I am considering just the execution time.

  3. If you can do multiple threads, there is a fairly hefty cost involved in setting those up too. Unless your array is obscenely long, you may not gain any benefits by doing that too. However, if it is multiple millions of items long, you may benefit by splitting it up into smaller chunks. I'd say it'd be rare for this scenario to be of use.

Upvotes: 0

Chris
Chris

Reputation: 710

I believe the answer to the general question is no. Suppose the following structure for A: the i'th element of A ai is given by i*x +1 so there are no elements in A, that are multiples of x. However you never save any time by using any of the "trickery" described above... you basically have to make a choice based on your knowledge of A and x. You basically can choose between O(N) and O(K) where K is the number of possible multiples of x in A, you can achieve O(K) by using hashing, such that the i*x in A becomes a constant time operation (on average...).

Upvotes: 1

tobias_k
tobias_k

Reputation: 82929

This seems to depend very much on the size of x and the number of elements within A, and particularly the number of candidate multiples of x within A.

Binary-searching a specific number within A takes O(log(n)) time (n being the number of elements within A), so if there are k possible multiples of x between the first and the last element of A, it will take O(k * log(N)) to check them all. If that number is smaller than n, you can use this algorithm, otherwise just do a linear search.

(Also, there are probably a few small optimizations to above algorithm. E.g., once you checked x*i (and did not find it), you can use the position where x*i should have been as the lower bound when searching for x*(i+1) instead of the very first element of the array.)

Upvotes: 14

Related Questions