Reputation: 1504
This is a problem from the Cormen text, but I'd like to see if there are any other solutions.
Given an array with n distinct numbers, you need to find the m largest ones in the array, and have them in sorted order. Assume n and m are large, but grow differently. In particular, you need to consider below the situations where m = t*n, where t is a small number, say 0.1, and then the possibility m = √n.
The solution given in the book offers 3 options:
These all make sense, and they all have their pros and cons, but I'm wondering, is there another way to do it? It doesn't have to be better or faster, I'm just curious to see if this is a common problem with more solutions, or if we are limited to those 3 choices.
Upvotes: 1
Views: 224
Reputation: 46409
A different approach.
Take the first m numbers, and turn them into a min heap. Run through the array, if its value exceeds the min of the top m then you extract the min value and insert the new one. When you reach the end of the array you can then extract the elements into an array and reverse it.
The worst case performance of this version is O(n log(m))
placing it between the first and second methods for efficiency.
The average case is more interesting. On average only O(m log(n/m))
of the elements are going to pass the first comparison test, each time incurring O(log(m))
work so you get O(n + m log(n/m) log(m))
work, which puts it between the second and third methods. But if n
is many orders of magnitude greater than m
then the O(n)
piece dominates, and the O(n)
median select in the third approach has worse constants than the one comparison per element in this approach, so in this case this is actually the fastest!
Upvotes: 3
Reputation: 79461
The time complexities of the three approaches you have mentioned are as follows.
So option (3) is definitely better than the others in terms of asymptotic complexity, since m <= n. When m is small, the difference between (2) and (3) is so small it would have little practical impact.
As for other ways to solve the problem, there are infinitely many ways you could, so the question is somewhat poor in this regard. Another approach I can think of as being practically simple and performant is the following.
I would only do this if m was very small though. Option (2) from your original list is also extremely easy to implement if you have a max-heap implementation and will work great.
Upvotes: 3