committedandroider
committedandroider

Reputation: 9261

Wouldn't this algorithm run in O(m log n)?

I am working on an interview question from Glassdoor Software Engineer

The question is

    Given a list of one million numbers, how will you find the top n numbers from the list in an efficient way

Here is a solution an author gave from same link

  1. create a min heap
  2. take first n of m elements and place in the heap (O(n))
  3. for each (m-n) remaining elements, if it is greater than find-min of the heap, insert into heap and delete min. (worst case O((m-n)log n) if the list is sorted.

net result is you can do this in O(n) memory usage and worst-case O((m-n)logn) runtime.

      I agree with the author's algorithm and the author's assessment of the space complexity of this algorithm. What I have an issue with is the author's analysis of the runtime for insertion into heap and overall time

For the step "take first n of m elements and place in the heap", wouldn't that run in O(nlogn)? At least according to my class notes Heap Add, insertion would be O(logn) and because you are inserting n elements, the runtime of that whole step would be O(nlogn).

Taking that into consideration, wouldn't the overall runtime of this entire algorithm be, using big oh addition from Big Oh Addition

 O(nlogn + (m-n)logn) = O(mlogn) 

Upvotes: 3

Views: 1101

Answers (2)

Louis Wasserman
Louis Wasserman

Reputation: 198163

Using that approach to building a heap, yes, but there is an O(n) algorithm for converting an array to a heap. See http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap for details.

That said, an O(m) time, O(n) memory solution exists for this problem, implemented by e.g. Guava's Ordering.leastOf. One implementation is

  • create a buffer, an array of size 2n
  • loop through the original array, adding elements to the buffer
  • whenever the buffer is full, use an O(n) quickselect to keep only the highest n elements from the buffer and discard the rest.
  • use one final quickselect to extract the highest n elements from the buffer

This requires O(m/n) quickselects, each of which take O(n), for O(m) time total.

Upvotes: 3

IVlad
IVlad

Reputation: 43467

For the step "take first n of m elements and place in the heap", wouldn't that run in O(nlogn)?

Not necessarily. You can create a heap from n elements in O(n). See here for how that can be achieved.

So you'd have O(n + (m - n)log n) = O((m - n)log n) = O(m log n). The last step is correct only if n is considered to be a constant, otherwise you should keep it as m - n, as the author has.

Followup question: can you solve the whole problem in O(m)?

Upvotes: 2

Related Questions