Dubby
Dubby

Reputation: 1144

Partial sorting to find the kth largest/smallest elements

Source : Wikipedia

A streaming, single-pass partial sort is also possible using heaps or other priority queue data structures. First, insert the first k elements of the input into the structure. Then make one pass over the remaining elements, add each to the structure in turn, and remove the largest element. Each insertion operation also takes O(log k) time, resulting in O(n log k) time overall.

  1. How is this different from the case where we first heapify the complete input array in O(n) time and then extract the minimum of the heap k times.
  2. I don't understand the part where it says to make one pass over the remaining elements, add each to the structure in turn, and remove the largest element. Isn't this the same as the method described in 1) ?

Upvotes: 3

Views: 2895

Answers (1)

Fred Foo
Fred Foo

Reputation: 363547

The suggested method is streaming. It doesn't need to have all the items in memory to run the heapify algorithm, given it O(k) space complexity (but it only finds the top-k items).

A more explicit description of the algorithm (see also the reference WP gives) is

  • given a stream of items:
  • make a heap of the first k elements in the stream,
  • for each element after the first k:
    • push it onto the heap,
    • extract the largest (or smallest) element from the heap and discard it,
  • finally return the k values left in the heap.

By construction, the heap never grows to more than k + 1 elements. The items can be streamed in from disk, over a network, etc., which is not possible with the heapify algorithm.

Upvotes: 6

Related Questions