Reputation:
The wikipedia article for merge sort.
The wikipedia article for quick sort.
Both articles have excellent visualizations.
Both have n*log(n) complexity.
So obviously the distribution of the data will effect the speed of the sort. My guess would be that since a comparison can just as quickly compare any two values, no matter their spread, the range of data values does not matter.
More importantly one should consider the lateral distribution (x direction ) with respect to ordering (magnitude removed).
A good test case to consider would be if the test data had some level of sorting...
Upvotes: 18
Views: 23845
Reputation: 2618
Mergesort is quicker when dealing with linked lists. This is because pointers can easily be changed when merging lists. It only requires one pass (O(n)) through the list.
Quicksort's in-place algorithm requires the movement (swapping) of data. While this can be very efficient for in-memory dataset, it can be much more expensive if your dataset doesn't fit in memory. The result would be lots of I/O.
These days, there is a lot of parallelization that occurs. Parallelizing Mergesort is simpler than Quicksort (in-place). If not using the in-place algorithm, then the space complexity for quicksort is O(n) which is the same are mergesort.
So, to generalize, quicksort is probably more effective for datasets that fit in memory. For stuff that's larger, it's better to use mergesort.
The other general time to use mergesort over quicksort is if the data is very similar (that is, not close to being uniform). Quicksort relies on using a pivot. In the case where all the values are the similar, quicksort hits a worst case of O(n^2). If the values of the data are very similar, then it's more likely that a poor pivot will be chosen leading to very unbalanced partitions leading to an O(n^2) runtime. The most straightforward example is if all the values in the list are the same.
Upvotes: 12
Reputation: 55022
While Java 6 and earlier versions use merge sort as the sorting algorithms, C# uses QuickSort as the sorting algorithm.
QuickSort performs better than merge sort even though they are both O(nlogn). QuickSort's has a smaller constant than merge sort.
Upvotes: 5
Reputation: 279225
Of the two, use merge sort when you need a stable sort. You can use a modified quicksort (such as introsort) when you don't, since it tends to be faster and it uses less memory.
Plain old Quicksort as described by Hoare is quite sensitive to performance-killing special cases that make it Theta(n^2)
, so you normally do need a modified version. That's where the data-distribution comes in, since merge sort doesn't have bad cases. Once you start modifying quicksort you can go on with all sorts of different tweaks, and introsort is one of the more effective ones. It detects on the fly whether it's in a killer case, and if so switches to heapsort.
In fact, Hoare's most basic Quicksort fails worst for already-sorted data, and so your "good test cases" with some level of sorting will kill it to some level. That fact is for curiosity only, though, since it only takes a very small tweak to avoid that, nothing like as complicated as going all the way to introsort. So it's simplistic to even bother analyzing the version that's killed by sorted data.
In practice, in C++ you'd generally use std::stable_sort
and std::sort
rather than worrying too much about the exact algorithm.
Upvotes: 5
Reputation: 153909
It typically depends on the data structures involved. Quick sort is
typically the fastest, but it doesn't guarantee O(n*log(n)); there are
degenerate cases where it becomes O(n^2). Heap sort is the usual
alternative; it guarantees O(n*log(n)), regardless of the initial order,
but it has a much higher constant factor. It's usually used when you
need a hard upper limit to the time taken. Some more recent algorithms
use quick sort, but attempt to recognize when it starts to degenerate,
and switch to heap sort then. Merge sort is used when the data
structure doesn't support random access, since it works with pure
sequential access (forward iterators, rather than random access
iterators). It's used in std::list<>::sort
, for example. It's also
widely used for external sorting, where random access can be very, very
expensive compared to sequential access. (When sorting a file which
doesn't fit into memory, you might break it into chunks which fit into
memory, sort these using quicksort, writing each out to a file, then
merge sort the generated files.)
Upvotes: 18
Reputation: 500247
There is a real-world sorting algorithm -- called Timsort -- that does exploit the idea that data encountered in the wild is often partially sorted.
The algorithm is derived from merge sort and insertion sort, and is used in CPython, Java 7 and Android.
See the Wikipedia article for more details.
Upvotes: 6
Reputation: 29837
Remember in practice, unless you have a very large data set and/or are executing the sort many many times, it probably won't matter at all. That being said, quicksort is generally considered the 'fastest' n*log(n) sorter. See this question already asked: Quick Sort Vs Merge Sort
Upvotes: 1