Reputation: 664
I have a minimum heap with n
elements and want to find k
smallest numbers from this heap. What is the worst-case complexity?
Here is my approach: somewhere on the stackoverflow I read that complexity of finding i-th smallest number in a min heap is O(i)
. So if we would like to find n-1
smallest numbers (n is pointless, since it would be the entire heap), the total complexity would look something like this:
O(n-1)+O(n-2)+O(n-3)+…+O(2)+O(1)=O((1+n-1)*(n/2))=O(n^2)
Is this correct?
Upvotes: 6
Views: 6328
Reputation: 46408
No, the time is much better than that. O(k log(n))
very easily, and O(k)
if you're smart.
Finding and removing the smallest element from the heap is O(log(n))
. This leads to O(k log(n))
time very easily.
BUT the result that you are thinking about is Frederickson, G.N.: An Optimal Algorithm for Selection in a Min-Heap(?) that shows how to find the value of the k
th smallest number in time O(k)
. Now you use the fact that a heap is a binary tree and start from the root and do a recursive search for every number that you find which is smaller than that largest. Then fill out the rest of your list with copies of the k
'th smallest number.
In that search you will wind up looking at up to k-1
that are at most that size, and for some of them you will look at up to 2 children that are too large to bother with, for a maximum of 3k-3
elements. This makes the whole algorithm O(k)
.
That link died due to bitrot. Hopefully https://www.sciencedirect.com/science/article/pii/S0890540183710308 lasts longer.
Upvotes: 8
Reputation: 372814
There is a recent (2019) algorithm that finds the k smallest elements of a binary min-heap in time O(k) that uses the soft heap data structure. This is a dramatically simpler algorithm than Frederickson’s original O(k)-time heap selection algorithm. See Selection from Heaps, Row-Sorted Matrices, and X+Y Using Soft Heaps by Kaplan et al.
Upvotes: 3
Reputation: 1286
I am doubtful that it is possible to identify the kth smallest element in time O(k). The best I have seen before is an O(k log k) algorithm, which also conveniently solves your problem by identifying the k smallest elements. You can read the details in another answer on StackOverflow or on Quora.
The basic idea is to manipulate a secondary heap. Initially, this secondary heap contains only the root of the original heap. At each step, the algorithm deletes the min of the secondary heap and inserts its two original children (that is, its children from the original heap) into the secondary heap.
This algorithm has a nice property that on step i, the element it deletes from the secondary heap is the ith smallest element overall. So after k steps, the set of items which have been deleted from the secondary heap are exactly the k smallest elements. This algorithm is O(k log k) because there are O(k) deletions/insertions into a secondary heap which is upper bounded in size by O(k).
EDIT: I stand corrected! btilly's answer provides a solution in O(k) using a result from this paper.
Upvotes: 4