Getting largest k elements in an array outputted in sorted order in O(n + klogk) using heaps?

Question

I'm trying to get the runtime of my algorithm down to O(n + klogk), but I can't seem to do this. I'm getting O(n + nlogk) by using a min heap. Basically algorithm is as follows:

Build a Min Heap MH of the first k elements (arr[0] to arr[k-1]) of the given array. O(k)
For each element, after the kth element (arr[k] to arr[n-1]), compare it with root of MH. O((n-k)logk)
Finally, MH has k largest elements and root of the MH is the kth largest element. Call extractMin k times to get O(klogk).

This gets me O(k + (n-k)Logk + kLogk), which equals O(k + nlogk). But I need O(n + klogk).

I don't know what to do to get it faster using the min heap. Any suggestions?

meowgoesthedog · Accepted Answer

The problem with using this min-heap approach is that you have to constantly process the entire rest of the array, giving you an n factor in-front of the log k.

There is a way to reduce the work done in actually extracting the first k elements, by using a divide-and-conquer strategy similar to that of quicksort. This exponentially decreases the part of the array which has to be processed.

Select a random pivot element.
Partition the array into those bigger and smaller than the pivot, just as in quicksort. Say there are L in the left partition and R in the right.
If k == L then we are done: the first k elements in the array are the smallest.
If k < L then continue from 1, processing the left partition only.
If k > L then continue from 1, processing the right partition with a new value of k' = k - L.

In case the above description is vague (edit: it probably is), this method is very well known, and there are plenty of sources out there (here's one from SO, with code).

Assuming that in the average case the pivot approximately divides the array into two equal halves (this is also the ideal case), the time complexity recurrence relation is given by T(m) = T(m/2) + O(m), which means that the above algorithm is O(n).

After this, the first k elements of the array are unordered, so just sort them at O(k log k). Therefore the total complexity is O(n + k log k), as required.

Getting largest k elements in an array outputted in sorted order in O(n + klogk) using heaps?

Answers (1)

Related Questions