AlonAlon
AlonAlon

Reputation: 215

Algorithm for finding if there's a "common number"

Let an array with the size of n. We need to write an algorithm which checks if there's a number which appears at least n/loglogn times.

I've understood that there's a way doing it in O(n*logloglogn) which goes something like this:

  1. Find the median using select algorithm and count how many times it appears. if it appears more than n/loglogn we return true. It takes O(n).
  2. Partition the array according the median. It takes O(n)
  3. Apply the algorithm on both sides of the partition (two n/2 arrays).
  4. If we reached a subarray of size less than n/loglogn, stop and return false.

Questions:

  1. Is this algorithm correct?
  2. The recurrence is: T(n) = 2T(n/2) + O(n) and the base case is T(n/loglogn) = O(1). Now, the largest number of calls in the recurrence-tree is O(logloglogn) and since every call is O(n) then the time complexity is O(n*logloglogn). Is that correct?

Upvotes: 0

Views: 69

Answers (1)

amit
amit

Reputation: 178491

The suggested solution works, and the complexity is indeed O(n/logloglog(n)).

Let's say a "pass i" is the running of all recursive calls of depth i. Note that each pass requires O(n) time, since while each call is much less than O(n), there are several calls - and overall, each element is processed once in each "pass".

Now, we need to find the number of passes. This is done by solving the equation:

n/log(log(n))  = n / 2^x
<->
n/log(log(n)) * 2^x = n 

And the idea is each call is dividing the array by half until you get to the predefined size of n/log(log(n)).

This problem is indeed solved for x in O(n/log(log(log(n))), as you can see in wolfram alpha, and thus the complexity is indeed O(nlog(log(log(n))))

As for correctness - that's because if an element repeats more than the required - it must be in some subarray with size greater/equals the required size, and by reducing constantly the size of the array, you will arrive to a case at some point where #repeats <= size(array) <= #repeats - at this point, you are going to find this element as the median, and find out it's indeed a "frequent item".

Some other approach, in O(n/log(log(n)) time - but with great constants is suggested by Karp-Papadimitriou-Shanker, and is based on filling a table with "candidates" while processing the array.

Upvotes: 1

Related Questions