Reputation: 6958

Merge k sorted arrays - comparing two solutions

If I have K sorted arrays of N elements each, e.g.

[0, 1, 2]
[1, 6, 8]
[10, 11, 12]

I know that I can use a heap to merge them by cycling all of the lists and their elements and inserting them into the heap, then getting back the minimum each time in O(KN * log(KN)).

I checked on the internet and another popular solution seems to be using a min heap of only K elements and inserting all the first items of the K lists into the heap, then get the minimum out and advance the pointer to the list that owned that minimum element.

Aside from a more efficient memory requirement (O(K) in the second case), is the second method more efficient time-wise?

Optional bonus points: is there an even better algorithm than the ones above?

Upvotes: 4

Answers (3)

Óscar López

Reputation: 236140

The first method is fine when you have enough memory to perform the sorting of all the input lists, but it'd be even simpler to just perform a k-way merge between the already-sorted lists, with a bit of extra space (a list of K elements) for keeping track of the index where you're at each input list. That's an O(K^2 * N) solution.

Which is better - the first method or the k-way merge, depends on how big is K compared to N, and let's not forget the O(KN) cost of building a heap for the first method. To give an idea:

k=5; n=100
k*n*log(k*n)
=> 3107
k*k*n
=> 2500

k=100; n=100
k*n*log(k*n)
=> 92103
k*k*n
=> 1000000

The second method uses less memory, and that's very important! it's the way to go when the input lists don't fit in memory - hence we'd take one element from each list, put it in the heap, determine the next one that goes into the final result, and write it to the output, updating the heap accordingly: that's O(KN * log(K)) in complexity. Again, to give an idea:

k=5; n=100
k*n*log(k)
=> 804

k=100; n=100
k*n*log(k)
=> 46051

Bottom line: Use a k-way merge instead of the first method when the input fits in memory and k is small, and as @btilly points out, the second method is theoretically the best of them all, but practical considerations might make a k-way merge faster. As usual: the best strategy is to profile with some real data, and pick the winner!

Upvotes: 2

btilly

Reputation: 46507

The first answer is O(KN * log(KN)) The second is O(KN * log(K)) and so is better. It is impossible to in general do better than that.

That said, you can improve on it sometimes in practice. Instead of dumping the minimum elements into a heap, create a tree of merges like merge-sort does. Then add logic to, when you seem to be pulling from one side of the merge, try to jump ahead and look for a run.

The win can be significant if K is large, comparisons are expensive, and your data has lots of runs.

See https://en.wikipedia.org/wiki/Timsort for an example of a sorting algorithm which tries something like this, and has been finely tuned for a lot of real world use cases.

Upvotes: 1

SaiBot

Reputation: 3755

The second version should have a runtime of O(KN* log(K)) since you perform a heapify (log (K)) operation for each element (N*K). So yes it is faster. I cannot think of a more efficient way to solve this problem.

Upvotes: 3

Merge k sorted arrays - comparing two solutions

Answers (3)

Related Questions