Reputation: 3042
This was an interview question. Initially i found it to be very easy and silly. At-last i failed to solve this. :(
The question is, we have an array which has sequence of 0's followed by sequence of 1's and a sequence of 0's. something like this.
int [] arr ={0,0,0,1,1,1,0,0,0}
Now, find the number of 1's in the array in log(n) time.
AnyIdea ? :(
Upvotes: 15
Views: 2312
Reputation: 18109
One proof that the complexity is not O(log n) would be if we can find an example where the complexity is O(n).
Consider an array containing only one 1. The problem of counting the 1s would be equal to the problem of locating that single 1. When probing the array we would get no extra information besides whether it is a 1 or a 0. We would not know if we probed before or after the 1. This means that to guarantee we would find the 1 we would have to probe the array n times, O(n).
Hence not O(log n).
Upvotes: 1
Reputation: 206667
You said,
we have an array which has sequence of 0's followed by sequence of 1's and a sequence of 0's.
Since your array is arranged in that form, there is a way to obtain the number of 1's in the array in less than O(N) time.
There are three steps:
We know that all the ones are in the middle. Find an index in the array that holds 1. Call the index the pivot. I am using a method that I don't have a name for. Here's the logic:
1.1. Check whether the array[N/2]
is one. If so, stop. N/2
is the pivot.
1.2. Check values of array[N*1/4]
and array[N*3/4]
. If any of it is 1, stop. The first index where we find a 1 is the pivot.
1.3 Check values of array[N*1/8]
, array[N*3/8]
, array[N*5/8]
and array[N*7/8]
. If any of it is 1, stop. The first index where we find a 1 is the pivot.
1.4. Keep on searching in this fashion until we find an index that has 1.
I don't have the background to determine the complexity of this algorithm but I believe it is better than O(N).
We know that the start of the 1's have to be in the range [0:pivot]
. Find the start using the method of bisection (an log(N) operation). Call the index where 1's start begin
.
We know that the end of the 1's have to be in the range [pivot+1:end-1]
. Find the end using the method of bisection (an log(N) operation). Call the index where the 0's start after the pivot end
.
The number of 1's in the array are end-begin
.
Here's a version of the code:
#include <stdio.h>
#include <iostream>
#include <algorithm>
#include <stdlib.h>
#include <time.h>
#include <assert.h>
int findBegin(int* begin, int* end, int startIndex)
{
std::cout << "*begin: " << *begin << ", *(end-1): " << *(end-1) << ", startIndex: " << startIndex << ", distance: " << (end-begin) << std::endl;
if ( (end - begin) < 2 )
{
assert(*begin == 1);
return startIndex;
}
int half = (end-begin)/2;
if ( *(begin+half-1) == 1 )
{
return findBegin(begin, begin+half, startIndex);
}
else
{
return findBegin(begin+half, end, startIndex+half);
}
}
int findEnd(int* begin, int* end, int startIndex)
{
std::cout << "*begin: " << *begin << ", *(end-1): " << *(end-1) << ", startIndex: " << startIndex << ", distance: " << (end-begin) << std::endl;
if ( (end - begin) < 2 )
{
assert(*begin == 1);
return startIndex+1;
}
int half = (end-begin)/2;
if ( *(begin+half) == 0 )
{
return findEnd(begin, begin+half, startIndex);
}
else
{
return findEnd(begin+half, end, startIndex+half);
}
}
int findAOne(int* begin, int startIndex, int size)
{
int start = size/2;
int step = size;
while ( true )
{
for ( int i = start; i < size; i += step )
{
std::cout << "Checking for 1 at index: " << i << std::endl;
if ( begin[i] == 1 )
{
return i;
}
}
start = start/2;
step = step/2;
}
// Should not come here unless there are no 1's in the array.
return -1;
}
void findOnes(int array[], int N)
{
int pivot = findAOne(array, 0, N);
if ( pivot < 0 )
{
return;
}
std::cout << "Index where 1 is found: " << pivot << "\n\n";
int begin = findBegin(array, array+pivot+1, 0);
std::cout << "Done looking for begin\n\n";
int end = findEnd(array+pivot+1, array+N, pivot);
std::cout << "Done looking for end\n\n";
// Print the bounds of the 1's that we found.
std::cout << "begin of 1's found: " << begin << std::endl;
std::cout << "end of 1's found: " << end << std::endl;
}
void fillData(int array[], int N)
{
srand(time(NULL));
int end1 = rand()%N;
int end2 = rand()%N;
int begin = std::min(end1, end2);
int end = std::max(end1, end2);
// Print the bounds of where the 1's are filled.
std::cout << "begin of 1's filled: " << begin << std::endl;
std::cout << "end of 1's filled: " << end << std::endl;
for ( int i = 0; i != begin; ++i )
{
array[i] = 0;
}
for ( int i = begin; i != end; ++i )
{
array[i] = 1;
}
for ( int i = end; i != N; ++i )
{
array[i] = 0;
}
}
int main(int argc, char** argv)
{
int N = atoi(argv[1]);
int* array = new int[N];
// Fill the array with 1's in the middle.
fillData(array, N);
// Find the ones.
findOnes(array, N);
delete [] array;
}
Output of sample run with 1000000 points:
begin of 1's filled: 972096 end of 1's filled: 998629 Checking for 1 at index: 500000 Checking for 1 at index: 250000 Checking for 1 at index: 750000 Checking for 1 at index: 125000 Checking for 1 at index: 375000 Checking for 1 at index: 625000 Checking for 1 at index: 875000 Checking for 1 at index: 62500 Checking for 1 at index: 187500 Checking for 1 at index: 312500 Checking for 1 at index: 437500 Checking for 1 at index: 562500 Checking for 1 at index: 687500 Checking for 1 at index: 812500 Checking for 1 at index: 937500 Checking for 1 at index: 31250 Checking for 1 at index: 93750 Checking for 1 at index: 156250 Checking for 1 at index: 218750 Checking for 1 at index: 281250 Checking for 1 at index: 343750 Checking for 1 at index: 406250 Checking for 1 at index: 468750 Checking for 1 at index: 531250 Checking for 1 at index: 593750 Checking for 1 at index: 656250 Checking for 1 at index: 718750 Checking for 1 at index: 781250 Checking for 1 at index: 843750 Checking for 1 at index: 906250 Checking for 1 at index: 968750 Checking for 1 at index: 15625 Checking for 1 at index: 46875 Checking for 1 at index: 78125 Checking for 1 at index: 109375 Checking for 1 at index: 140625 Checking for 1 at index: 171875 Checking for 1 at index: 203125 Checking for 1 at index: 234375 Checking for 1 at index: 265625 Checking for 1 at index: 296875 Checking for 1 at index: 328125 Checking for 1 at index: 359375 Checking for 1 at index: 390625 Checking for 1 at index: 421875 Checking for 1 at index: 453125 Checking for 1 at index: 484375 Checking for 1 at index: 515625 Checking for 1 at index: 546875 Checking for 1 at index: 578125 Checking for 1 at index: 609375 Checking for 1 at index: 640625 Checking for 1 at index: 671875 Checking for 1 at index: 703125 Checking for 1 at index: 734375 Checking for 1 at index: 765625 Checking for 1 at index: 796875 Checking for 1 at index: 828125 Checking for 1 at index: 859375 Checking for 1 at index: 890625 Checking for 1 at index: 921875 Checking for 1 at index: 953125 Checking for 1 at index: 984375 Index where 1 is found: 984375 *begin: 0, *(end-1): 1, startIndex: 0, distance: 984376 *begin: 0, *(end-1): 1, startIndex: 492188, distance: 492188 *begin: 0, *(end-1): 1, startIndex: 738282, distance: 246094 *begin: 0, *(end-1): 1, startIndex: 861329, distance: 123047 *begin: 0, *(end-1): 1, startIndex: 922852, distance: 61524 *begin: 0, *(end-1): 1, startIndex: 953614, distance: 30762 *begin: 0, *(end-1): 1, startIndex: 968995, distance: 15381 *begin: 0, *(end-1): 1, startIndex: 968995, distance: 7690 *begin: 0, *(end-1): 1, startIndex: 968995, distance: 3845 *begin: 0, *(end-1): 1, startIndex: 970917, distance: 1923 *begin: 0, *(end-1): 1, startIndex: 971878, distance: 962 *begin: 0, *(end-1): 1, startIndex: 971878, distance: 481 *begin: 0, *(end-1): 1, startIndex: 971878, distance: 240 *begin: 0, *(end-1): 1, startIndex: 971998, distance: 120 *begin: 0, *(end-1): 1, startIndex: 972058, distance: 60 *begin: 0, *(end-1): 1, startIndex: 972088, distance: 30 *begin: 0, *(end-1): 1, startIndex: 972088, distance: 15 *begin: 0, *(end-1): 1, startIndex: 972095, distance: 8 *begin: 0, *(end-1): 1, startIndex: 972095, distance: 4 *begin: 0, *(end-1): 1, startIndex: 972095, distance: 2 *begin: 1, *(end-1): 1, startIndex: 972096, distance: 1 Done looking for begin *begin: 1, *(end-1): 0, startIndex: 984375, distance: 15624 *begin: 1, *(end-1): 0, startIndex: 992187, distance: 7812 *begin: 1, *(end-1): 0, startIndex: 996093, distance: 3906 *begin: 1, *(end-1): 0, startIndex: 998046, distance: 1953 *begin: 1, *(end-1): 0, startIndex: 998046, distance: 976 *begin: 1, *(end-1): 0, startIndex: 998534, distance: 488 *begin: 1, *(end-1): 0, startIndex: 998534, distance: 244 *begin: 1, *(end-1): 0, startIndex: 998534, distance: 122 *begin: 1, *(end-1): 0, startIndex: 998595, distance: 61 *begin: 1, *(end-1): 0, startIndex: 998625, distance: 31 *begin: 1, *(end-1): 0, startIndex: 998625, distance: 15 *begin: 1, *(end-1): 0, startIndex: 998625, distance: 7 *begin: 1, *(end-1): 1, startIndex: 998625, distance: 3 *begin: 1, *(end-1): 1, startIndex: 998626, distance: 2 *begin: 1, *(end-1): 1, startIndex: 998627, distance: 1 Done looking for end begin of 1's found: 972096 end of 1's found: 998628
As can be seen from the output, it took 65
steps to find the pivot. After the pivot is found the search for begin
and end
is quick.
Upvotes: 1
Reputation: 105905
You can't. You currently have only three assumptions on the array:
1
,1
,1
s, there is no 0
between them.In order to find something in a array, you can use linear search and binary* search. Linear search doesn't seem appropriate, since you want to achieve logarithmic time.
However, for binary search, you need that arr[i] <= arr[j]
for all i < j
. Since this isn't the case, you don't know which half contains the 1
. This means that you would need to check both sides, which leads to linear complexity.
So why doesn't any kind of binary search work? Before I answer this question, lets put in a quick section about how binary search actually works, since there seems to be a little bit of confusion:
Binary search works so well because it can reduce the size of the problem heavily: in each iteration the problem size gets halved.
For example, lets say we're looking for 5
in the ordered sequence 000011223334456
.
This works perfectly, since we get the following sequence: we first check the middle, which is 2. Therefore we know that the solution is in the right side, and we never have to check the left side. The middle of the right side is four, so we can again cut off the left half. The next middle is 5. We stop:
In the image, the red section will never be inspected. Never. For the complexity, let n be our original problem size. After a single step, our problem has size n/2
. After the k-th step, it has size n/(2^k). So if k = log2 (n), we have reduced our problem to size one. Since we checked only one value in each step, and we have a total of log2(n) steps (rounded up to the next integral), we have logarithmic time complexity.
The short answer is: because your problem isn't sorted. Lets try to use binary search:
What happens after a single step?
Inspecting the middle value doesn't give us any information at all, except for that we know that it isn't a 1
. We don't know whether we need to traverse the left or the right tree. Therefore we need to traverse both of them. And in order to create a deterministic algorithm, we need to fix the order of traversal for all applications. If we traverse right-to-left, we find it relatively fast:
But if the 1
was at the left, we would check almost all elements. You also note that we can't rule out as many nodes as we could in the real binary search.
By the way, the alternating variants also suffer from the same problem, since alternating only means level-based-traversal. Instead of following a path, you would check the nodes based on their levels:
Some of the comments propose parallel/simultaneously search in both trees. While this actually cuts down the total time, time complexity is measured in the sense of turing machines. And at one point or another you're going to run out of strips, or CPUs. Remember folks, this is about theoretical computational time complexity.
If you can't find a single value in logarithmic time, you cannot find a pair of values e.g. (0,1) in logarithmic time. But if you know the position of a single 1, then the left side and and the right side are ordered sets, since they are 000....011..11
and 11....1100...00
and one can use binary search.
After all the discussion, it should be clear that we need linear runtime to find even a single 1.
However, together with an assumption, we can find the edges in logarithmic time and subtract their positions:
1
at position k
(your example suggests one at size/2
)If you're wondering how this helps, have a look at the previous section.
However, without the additional assumption it can't be done.
* or any other n-ary search (n > 2), but they all have logarithmic cost
Upvotes: 6
Reputation: 126378
Best solution I can come up with:
Step 2 is O(log n) trivially. The problem is that step 1 is O(n/k) where k is the number of 1s in the array. This is O(n) if the number of 1s is unrestricted in any way, but if instead the number of 1s is required to be a certain fraction of the array (at least 10% or at least 1% or any lower bound linear in n), then it becomes O(1) average case (though still O(n) worst case)
Upvotes: 4