Jim Blum
Jim Blum

Reputation: 2646

Linear time when running merge sort and quick sort

As far I learned from my University, it is proved that the lower bound of an comparison-based algorithm that sorts random data is Ω(nlogn). I also know that the average case of Heapsort and Quicksort is O(nlgn).

Therefore, I tried to plot the times these algorithms needed to sort a set of random data.

I used the algorithms that are posted at Roseta Code: quicksort and heapsort. When I tried to plot the time each one needed to sort random data, for up to 1 Million numbers, I got the following charts that appear to be linear:

Heapsort Quicksort

You can also find the results I got from running heapsort from here. In addition you can also find the results I got from running quicksort from here

However, I do get O(n^2) time complexity when running bubblesort as shown at the plot below: enter image description here

Why is this? What I might be missing here?

Upvotes: 2

Views: 1390

Answers (1)

HugoRune
HugoRune

Reputation: 13809

The difference is simply too small to see with the naked eye at this scale:

Using your HeapSort results (600ms for 1000000 entries), here is a O(n) function (green) and a O(n log n) function (red): enter image description here (from http://fooplot.com/plot/gnbb0vdgno )

The two functions in this picture are:

  • y = 600/1000000 * x green
  • y = 1/(10000 log(10)) * x*log(x) red

(Note that these functions have vastly different constant scaling factors, but of course these do not matter in Big-O notation.)

However just because they are hard to see in a diagram, does not mean they are impossible to distinguish.

As mentioned in the comments, your main options are bigger datasets, or slower comparison functions. Most sorting algorithms will allow you to specify a comparison function, which should not, under normal circumstances, change the O() time complexity. (beware of non-transitive comparison functions though)

If that is not possible, and you just want to thread the algorithms as black boxes, you could simply repeat the experiment and average the results until noise is low enough to distinguish between these two curves.

To get the appropriate "ideal" n log n curve for comparison with your averaged data, you need to solve the equation y = a*x * log(x); y=MAXIMUM_TIME; x=MAXIMUM_INPUT_LENGTH;, for example with Wolfram Alpha


One important point here is that even though these curves look similar, this does not mean that the run-time of a hypothetical linear sorting algorithm would not be worthwhile for less than a million entries. If you managed to come up with a linear sorting algorithm with the same constant factor as the n log n algorithm, the curves would look like this:

enter image description here

Upvotes: 8

Related Questions