Reputation: 95

Comparing sorting algorithms when the data items are incoming rather than being present already

I am at an intermediate level in Algorithms. Recently when I was comparing different sorting algorithms this thing stuck to me.

How do you compare different sorting algorithms when the data is rather incoming than already being present?

I have compared a few myself but not very sure if it is the right approach.

Insertion Sort : As the name itself suggests, it presents a nice solution to the problem with O(n^2) complexity.

Heap Sort : The technique is to build the heap for each data item pushed. It corresponds to a sift-up operation with O(logn) complexity and then exchange the first element with the last element and Heapify to restore the heap properties. Heapify is again O(logn) so the overall complexity is O(n logn logn). But if we have all the data items present with us already it is only O(n logn) because we are doing only Heapify operation on the data items after we have built the heap.

Selection sort : It needs all the data items before sorting, so I assume there is no solution to our problem using selection sort.

Tree sort : The dominant step in this technique is to build a tree which has a Worst-case time complexity of O(n^2). And then an in-order traversal would do.

I am not very sure about the other algorithms.

I am posting this question as I am looking for a complete hold on these sorting techniques. Pardon me if you find any discrepancies in either my question or my comparisons.

Upvotes: 1

Answers (2)

Jerry Coffin

Reputation: 490653

Building a heap one item at a time doesn't add any extra complexity (and your O(n log n log n) doesn't make sense to me at all). In fact, the normal way to build a heap is to add one item at a time to the heap anyway. That is to say, even if you have all your items available to start with, you start by building a heap of the first two, then adding the third, the fourth, and so on. If you're receiving the items one at a time, it doesn't cause any extra work at all. It ends up O(N log N), just like always.

Just to be clear: the swapping an item to the first position and then sifting into the tree is not for when you're adding items -- it's for when you're removing items. That is to say: when you're done inserting all the items into the heap, you're reading to produce sorted output. You do that by swapping the item at the base of the heap (i.e., first item in the array) to the end of the array, then taking the item that started out at the end of the array, and sifting it upward in the heap until the heap property is restored.

When you're adding new items to the heap, you add them at the end, then sift them downward to restore the heap property. In the typical case where they're already all available, you do that by starting at the beginning of the array, and having a partition between items that form a heap, and items that are still just a jumbled mess. To build all the items into a heap, you move that partition over one item, sift that item down to where it belongs, and repeat. With an "online" version that doesn't have all the data immediately available, all that changes is that after you sift an item into the heap, you have to wait for the next item to arrive before you sift it into the heap (where you'd normally insert the next item immediately).

Building a tree is only O(N²) if 1) the items arrive perfectly ordered, and 2) you don't realize that, and 3) you don't re-balance the tree as you build it. If the items are in fairly random order to start with, even without balancing, your complexity will be close to O(N log N). In a typical case of building something like a Red-Black or AVL tree, it'll remain O(N log N) even if the items are in order (though it would be O(N log N) with lower constants if they arrive in more or less random order instead of sorted). Likewise, if you use a B-tree, you'll get O(N log N) complexity, even if the data arrive in order.

Upvotes: 1

Kevin Melkowski

Reputation: 483

Insertion Sort : As the name itself suggests, it presents a nice solution to the problem with O(n^2) complexity.

I agree, not all that great, in either case.

Heap Sort : The technique is to build the heap for each data item pushed. It corresponds to a sift-up operation with O(logn) complexity and then exchange the first element with the last element and Heapify to restore the heap properties. Heapify is again O(logn) so the overall complexity is O(n logn logn). But if we have all the data items present with us already it is only O(n logn) because we are doing only Heapify operation on the data items after we have built the heap.

You can do this with a min-heap Binary tree, but if you're using an array, this may be difficult.

Selection sort : It needs all the data items before sorting, so I assume there is no solution to our problem using selection sort.

Terrible in either case.

Tree sort : The dominant step in this technique is to build a tree which has a Worst-case time complexity of O(n^2). And then an in-order traversal would do.

O(nlogn) if you use radix sort. See heap sort comments about array.

Other sorts:
Counting sort: Terrible if you don't know the data before hand. Since you really don't know how much space you need before hand.

Radix Sort: Still amazing, will be the best sorting algorithm in either case. Especially when you do it with a byte sort. All you do is input into a histogram as the data comes in, then store it into a temp linked list. Then do the radix sort afterwards.

Bucket Sort: Can go either way, much better than counting sort, not as good as radix sort.

Upvotes: 0

Comparing sorting algorithms when the data items are incoming rather than being present already

Answers (2)

Related Questions