Reputation: 321
A little task on searching algorithm and complextiy in C. I just want to make sure im right.
I have n natural numbers from 1 to n+1 ordered from small to big, and i need to find the missing one. For example: 1 2 3 5 6 7 8 9 10 11 - ans: 4
The fastest and the simple answer is do one loop and check every number with the number that comes after it. And the complexity of that is O(n) in the worst case.
I thought maybe i missing something and i can find it with using Binary Search. Can anybody think on more efficient algorithm in that simple example? like O(log(n)) or something ?
Upvotes: 0
Views: 84
Reputation:
For a comparison-based algorithm, you can't beat Lg(N)
comparisons in the worst case. This is simply because the answer is a number between 1
and N
and it takes Lg(N)
bits of information to represent such a number. (And a comparison gives you a single bit.)
Unless the distribution of the answers is very skewed, you can't do much better than Lg(N)
on average.
Now I don't see how a non-comparison-based method could exploit the fact that the sequence is ordered, and do better than O(N)
.
Upvotes: 0
Reputation: 36442
There's obviously two answers:
If your problem is a purely theoretical problem, especially for large n
, you'd do something like a binary search and check whether the middle between the two last boundaries is actually (upper-lower)/2
.
However, if this is a practical question, for modern systems executing programs written in C
and compiled by a modern, highly optimizing compiler for n << 10000
, I'd assume that the linear search approach is much, much faster, simply because it can be vectorized so easily. In fact, modern CPUs have instructions to take e.g. each
and so on, which very neatly lends itself to the fact that CPUs and memory controllers prefetch linear memory, and thus, jumping around in logarithmically descending step sizes can have an enormous performance impact.
So: For large n
, where linear search would be impractical, go for the binary search approach; for n
where that is questionable, go for the linear search. If you not only have SIMD capabilities but also multiple cores, you will want to split your problem. If your problem is not actually exactly 1 missing number, you might want to use a completely different approach ... The whole O(n)
business is generally more of a benchmark usable purely for theoretical constructs, and unless the difference is immensely large, is rarely the sole reason to pick a specific algorithm in a real-world implementation.
Upvotes: 1