Jeevi
Jeevi

Reputation: 3042

Find number of 1's in sequence of 0's and 1's

This was an interview question. Initially i found it to be very easy and silly. At-last i failed to solve this. :(

The question is, we have an array which has sequence of 0's followed by sequence of 1's and a sequence of 0's. something like this.

   int [] arr ={0,0,0,1,1,1,0,0,0}

Now, find the number of 1's in the array in log(n) time.

AnyIdea ? :(

Upvotes: 15

Views: 2312

Answers (4)

Martin G
Martin G

Reputation: 18109

One proof that the complexity is not O(log n) would be if we can find an example where the complexity is O(n).

Consider an array containing only one 1. The problem of counting the 1s would be equal to the problem of locating that single 1. When probing the array we would get no extra information besides whether it is a 1 or a 0. We would not know if we probed before or after the 1. This means that to guarantee we would find the 1 we would have to probe the array n times, O(n).

Hence not O(log n).

Upvotes: 1

R Sahu
R Sahu

Reputation: 206667

You said,

we have an array which has sequence of 0's followed by sequence of 1's and a sequence of 0's.

Since your array is arranged in that form, there is a way to obtain the number of 1's in the array in less than O(N) time.

There are three steps:

  1. We know that all the ones are in the middle. Find an index in the array that holds 1. Call the index the pivot. I am using a method that I don't have a name for. Here's the logic:

    1.1. Check whether the array[N/2] is one. If so, stop. N/2 is the pivot.

    1.2. Check values of array[N*1/4] and array[N*3/4]. If any of it is 1, stop. The first index where we find a 1 is the pivot.

    1.3 Check values of array[N*1/8], array[N*3/8], array[N*5/8] and array[N*7/8]. If any of it is 1, stop. The first index where we find a 1 is the pivot.

    1.4. Keep on searching in this fashion until we find an index that has 1.

    I don't have the background to determine the complexity of this algorithm but I believe it is better than O(N).

  2. We know that the start of the 1's have to be in the range [0:pivot]. Find the start using the method of bisection (an log(N) operation). Call the index where 1's start begin.

  3. We know that the end of the 1's have to be in the range [pivot+1:end-1]. Find the end using the method of bisection (an log(N) operation). Call the index where the 0's start after the pivot end.

The number of 1's in the array are end-begin.

Here's a version of the code:

#include <stdio.h>
#include <iostream>
#include <algorithm>
#include <stdlib.h>
#include <time.h>
#include <assert.h>

int findBegin(int* begin, int* end, int startIndex)
{
   std::cout << "*begin: " << *begin << ", *(end-1): " << *(end-1) << ", startIndex: " << startIndex << ", distance: " << (end-begin) << std::endl;

   if ( (end - begin) < 2 )
   {
      assert(*begin == 1);
      return startIndex;
   }

   int half = (end-begin)/2;
   if ( *(begin+half-1) == 1 )
   {
      return findBegin(begin, begin+half, startIndex);
   }
   else
   {
      return findBegin(begin+half, end, startIndex+half);
   }
}

int findEnd(int* begin, int* end, int startIndex)
{
   std::cout << "*begin: " << *begin << ", *(end-1): " << *(end-1) << ", startIndex: " << startIndex << ", distance: " << (end-begin) << std::endl;

   if ( (end - begin) < 2 )
   {
      assert(*begin == 1);
      return startIndex+1;
   }

   int half = (end-begin)/2;
   if ( *(begin+half) == 0 )
   {
      return findEnd(begin, begin+half, startIndex);
   }
   else
   {
      return findEnd(begin+half, end, startIndex+half);
   }
}

int findAOne(int* begin, int startIndex, int size)
{
   int start = size/2;
   int step = size;
   while ( true )
   {
      for ( int i = start; i < size; i += step )
      {
         std::cout << "Checking for 1 at index: " << i << std::endl;
         if ( begin[i] == 1 )
         {
            return i;
         }
      }
      start = start/2;
      step = step/2;
   }

   // Should not come here unless there are no 1's in the array.
   return -1;
}

void findOnes(int array[], int N)
{
   int pivot = findAOne(array, 0, N);
   if ( pivot < 0 )
   {
      return;
   }

   std::cout << "Index where 1 is found: " << pivot << "\n\n";

   int begin = findBegin(array, array+pivot+1, 0);
   std::cout << "Done looking for begin\n\n";

   int end = findEnd(array+pivot+1, array+N, pivot);
   std::cout << "Done looking for end\n\n";

   // Print the bounds of the 1's that we found.
   std::cout << "begin of 1's found: " << begin << std::endl;
   std::cout << "end of 1's found: " << end << std::endl;
}

void fillData(int array[], int N)
{
   srand(time(NULL));
   int end1 = rand()%N;
   int end2 = rand()%N;
   int begin = std::min(end1, end2);
   int end = std::max(end1, end2);

   // Print the bounds of where the 1's are filled.
   std::cout << "begin of 1's filled: " << begin << std::endl;
   std::cout << "end of 1's filled: " << end << std::endl;

   for ( int i = 0; i != begin; ++i )
   {
      array[i] = 0;
   }

   for ( int i = begin; i != end; ++i )
   {
      array[i] = 1;
   }

   for ( int i = end; i != N; ++i )
   {
      array[i] = 0;
   }
}

int main(int argc, char** argv)
{
   int N = atoi(argv[1]);
   int* array = new int[N];

   // Fill the array with 1's in the middle.
   fillData(array, N);

   // Find the ones.
   findOnes(array, N);

   delete [] array;
}

Output of sample run with 1000000 points:

begin of 1's filled: 972096
end of 1's filled: 998629
Checking for 1 at index: 500000
Checking for 1 at index: 250000
Checking for 1 at index: 750000
Checking for 1 at index: 125000
Checking for 1 at index: 375000
Checking for 1 at index: 625000
Checking for 1 at index: 875000
Checking for 1 at index: 62500
Checking for 1 at index: 187500
Checking for 1 at index: 312500
Checking for 1 at index: 437500
Checking for 1 at index: 562500
Checking for 1 at index: 687500
Checking for 1 at index: 812500
Checking for 1 at index: 937500
Checking for 1 at index: 31250
Checking for 1 at index: 93750
Checking for 1 at index: 156250
Checking for 1 at index: 218750
Checking for 1 at index: 281250
Checking for 1 at index: 343750
Checking for 1 at index: 406250
Checking for 1 at index: 468750
Checking for 1 at index: 531250
Checking for 1 at index: 593750
Checking for 1 at index: 656250
Checking for 1 at index: 718750
Checking for 1 at index: 781250
Checking for 1 at index: 843750
Checking for 1 at index: 906250
Checking for 1 at index: 968750
Checking for 1 at index: 15625
Checking for 1 at index: 46875
Checking for 1 at index: 78125
Checking for 1 at index: 109375
Checking for 1 at index: 140625
Checking for 1 at index: 171875
Checking for 1 at index: 203125
Checking for 1 at index: 234375
Checking for 1 at index: 265625
Checking for 1 at index: 296875
Checking for 1 at index: 328125
Checking for 1 at index: 359375
Checking for 1 at index: 390625
Checking for 1 at index: 421875
Checking for 1 at index: 453125
Checking for 1 at index: 484375
Checking for 1 at index: 515625
Checking for 1 at index: 546875
Checking for 1 at index: 578125
Checking for 1 at index: 609375
Checking for 1 at index: 640625
Checking for 1 at index: 671875
Checking for 1 at index: 703125
Checking for 1 at index: 734375
Checking for 1 at index: 765625
Checking for 1 at index: 796875
Checking for 1 at index: 828125
Checking for 1 at index: 859375
Checking for 1 at index: 890625
Checking for 1 at index: 921875
Checking for 1 at index: 953125
Checking for 1 at index: 984375
Index where 1 is found: 984375

*begin: 0, *(end-1): 1, startIndex: 0, distance: 984376
*begin: 0, *(end-1): 1, startIndex: 492188, distance: 492188
*begin: 0, *(end-1): 1, startIndex: 738282, distance: 246094
*begin: 0, *(end-1): 1, startIndex: 861329, distance: 123047
*begin: 0, *(end-1): 1, startIndex: 922852, distance: 61524
*begin: 0, *(end-1): 1, startIndex: 953614, distance: 30762
*begin: 0, *(end-1): 1, startIndex: 968995, distance: 15381
*begin: 0, *(end-1): 1, startIndex: 968995, distance: 7690
*begin: 0, *(end-1): 1, startIndex: 968995, distance: 3845
*begin: 0, *(end-1): 1, startIndex: 970917, distance: 1923
*begin: 0, *(end-1): 1, startIndex: 971878, distance: 962
*begin: 0, *(end-1): 1, startIndex: 971878, distance: 481
*begin: 0, *(end-1): 1, startIndex: 971878, distance: 240
*begin: 0, *(end-1): 1, startIndex: 971998, distance: 120
*begin: 0, *(end-1): 1, startIndex: 972058, distance: 60
*begin: 0, *(end-1): 1, startIndex: 972088, distance: 30
*begin: 0, *(end-1): 1, startIndex: 972088, distance: 15
*begin: 0, *(end-1): 1, startIndex: 972095, distance: 8
*begin: 0, *(end-1): 1, startIndex: 972095, distance: 4
*begin: 0, *(end-1): 1, startIndex: 972095, distance: 2
*begin: 1, *(end-1): 1, startIndex: 972096, distance: 1
Done looking for begin

*begin: 1, *(end-1): 0, startIndex: 984375, distance: 15624
*begin: 1, *(end-1): 0, startIndex: 992187, distance: 7812
*begin: 1, *(end-1): 0, startIndex: 996093, distance: 3906
*begin: 1, *(end-1): 0, startIndex: 998046, distance: 1953
*begin: 1, *(end-1): 0, startIndex: 998046, distance: 976
*begin: 1, *(end-1): 0, startIndex: 998534, distance: 488
*begin: 1, *(end-1): 0, startIndex: 998534, distance: 244
*begin: 1, *(end-1): 0, startIndex: 998534, distance: 122
*begin: 1, *(end-1): 0, startIndex: 998595, distance: 61
*begin: 1, *(end-1): 0, startIndex: 998625, distance: 31
*begin: 1, *(end-1): 0, startIndex: 998625, distance: 15
*begin: 1, *(end-1): 0, startIndex: 998625, distance: 7
*begin: 1, *(end-1): 1, startIndex: 998625, distance: 3
*begin: 1, *(end-1): 1, startIndex: 998626, distance: 2
*begin: 1, *(end-1): 1, startIndex: 998627, distance: 1
Done looking for end

begin of 1's found: 972096
end of 1's found: 998628

As can be seen from the output, it took 65 steps to find the pivot. After the pivot is found the search for begin and end is quick.

Upvotes: 1

Zeta
Zeta

Reputation: 105905

Well, how to put this…

You can't. You currently have only three assumptions on the array:

  • it doesn't start with a 1,
  • it doesn't end with a 1,
  • for any two 1s, there is no 0 between them.

In order to find something in a array, you can use linear search and binary* search. Linear search doesn't seem appropriate, since you want to achieve logarithmic time. However, for binary search, you need that arr[i] <= arr[j] for all i < j. Since this isn't the case, you don't know which half contains the 1. This means that you would need to check both sides, which leads to linear complexity.

Why n-ary (n >= 2) search doesn't work

So why doesn't any kind of binary search work? Before I answer this question, lets put in a quick section about how binary search actually works, since there seems to be a little bit of confusion:

How does binary search work?

Binary search works so well because it can reduce the size of the problem heavily: in each iteration the problem size gets halved.

For example, lets say we're looking for 5 in the ordered sequence 000011223334456.

initial search tree

This works perfectly, since we get the following sequence: we first check the middle, which is 2. Therefore we know that the solution is in the right side, and we never have to check the left side. The middle of the right side is four, so we can again cut off the left half. The next middle is 5. We stop:

search going on

In the image, the red section will never be inspected. Never. For the complexity, let n be our original problem size. After a single step, our problem has size n/2. After the k-th step, it has size n/(2^k). So if k = log2 (n), we have reduced our problem to size one. Since we checked only one value in each step, and we have a total of log2(n) steps (rounded up to the next integral), we have logarithmic time complexity.

Why binary search doesn't work in your situation

The short answer is: because your problem isn't sorted. Lets try to use binary search:

oh oh

What happens after a single step?

middle

Inspecting the middle value doesn't give us any information at all, except for that we know that it isn't a 1. We don't know whether we need to traverse the left or the right tree. Therefore we need to traverse both of them. And in order to create a deterministic algorithm, we need to fix the order of traversal for all applications. If we traverse right-to-left, we find it relatively fast:

whoops

But if the 1 was at the left, we would check almost all elements. You also note that we can't rule out as many nodes as we could in the real binary search.

By the way, the alternating variants also suffer from the same problem, since alternating only means level-based-traversal. Instead of following a path, you would check the nodes based on their levels:

alternating

Some of the comments propose parallel/simultaneously search in both trees. While this actually cuts down the total time, time complexity is measured in the sense of turing machines. And at one point or another you're going to run out of strips, or CPUs. Remember folks, this is about theoretical computational time complexity.

Why is it so important to find a single 1?

If you can't find a single value in logarithmic time, you cannot find a pair of values e.g. (0,1) in logarithmic time. But if you know the position of a single 1, then the left side and and the right side are ordered sets, since they are 000....011..11 and 11....1100...00 and one can use binary search.

So can one actually solve it in logarithmic time?

After all the discussion, it should be clear that we need linear runtime to find even a single 1.

However, together with an assumption, we can find the edges in logarithmic time and subtract their positions:

  • the array has a 1 at position k (your example suggests one at size/2)

If you're wondering how this helps, have a look at the previous section.

However, without the additional assumption it can't be done.


* or any other n-ary search (n > 2), but they all have logarithmic cost

Upvotes: 6

Chris Dodd
Chris Dodd

Reputation: 126378

Best solution I can come up with:

  1. Use random probing or a random hash probe to find a 1 in the array
  2. Use binary searches from there to find the first and last 1s in the array

Step 2 is O(log n) trivially. The problem is that step 1 is O(n/k) where k is the number of 1s in the array. This is O(n) if the number of 1s is unrestricted in any way, but if instead the number of 1s is required to be a certain fraction of the array (at least 10% or at least 1% or any lower bound linear in n), then it becomes O(1) average case (though still O(n) worst case)

Upvotes: 4

Related Questions