How to find the most frequent number and its frequency in an array in range L,R most efficiently?

Question

Lets say we are given an array A[] of length N and we have to answer Q queries which consists of two integers L,R. We have to find the number from A[L] to A[R] which has its frequency at least (R-L+1)/2. If such number doesn't exist then we have to print "No such number"

I could think of only O(Q*(R-L)) approach of running a frequency counter and first obtaining the most frequent number in the array from L to R. Then count its frequency.

But more optimization is needed.

Constraints: 1<= N <= 3*10^5, ,1<=Q<=10^5 ,1<=L<=R<=N

kraskevich · Accepted Answer

I know an O((N + Q) * sqrt(N)) solution:

Let's call a number heavy if at occurs at least B times in the array. There are at most N / B heavy numbers in the array.
If the query segment is "short" (R - L + 1 < 2 * B), we can answer it in O(B) time (by simply iterating over all elements of the range).
If the query segment is "long" (R - L + 1 >= 2 * B), a frequent element must be heavy. We can iterate over all heavy numbers and check if at least one then fits (to do that, we can precompute prefix sums of number of occurrences for each heavy element and find the number of its occurrences in a [L, R] segment in constant time).

If we set B = C * sqrt(N) for some constant C, this solution runs in O((N + Q) * sqrt(N)) time and uses O(N * sqrt(N)) memory. With properly chosen C, and may fit into time and memory limit.

There is also a randomized solution which runs in O(N + Q * log N * k) time.

Let's store a vector of position of occurrences for each unique element in the array. Now we can find the number of occurrences of a fixed element in a fixed range in O(log N) time (two binary searches over the vector of occurrences).
For each query, we'll do the following:
- pick a random element from the segment
- Check the number of its occurrences in O(log N) time as described above
- If it's frequent enough, we are done. Otherwise, we pick another random element and do the same
- If a frequent element exists, the probability not to pick it is no more than 1 / 2 for each trial. If we do it k times, the probability not to find it is (1 / 2) ^ k

With a proper choice of k (so that O(k * log N) per query is fast enough and (1 / 2) ^ k is reasonably small), this solution should pass.

Both solutions are easy to code (the first just needs prefix sums, the second only uses a vector of occurrences and binary search). If I had to code one them, I'd pick the latter (the former can be more painful to squeeze in time and memory limit).

How to find the most frequent number and its frequency in an array in range L,R most efficiently?

Answers (1)

Related Questions