Celeritas
Celeritas

Reputation: 15063

How does sorting the smaller (sub)array make quicksort faster?

I've heard that in quicksort it is better to call recursion on the smaller subarray first. For example if 5 was the pivot and the data gets sorted to 4,1,3,5,7,6 then it's better to sort the subarray 7,6 first because it contains two elements where as 4,1,3 contains three.

The source of all knowledge gives the pseudocode for quicksort

quicksort(A, i, k):
  if i < k:
    p := partition(A, i, k)
    quicksort(A, i, p - 1)
    quicksort(A, p + 1, k)

So an algorithm that would implement away to recurse on the smaller array first would look something like

quicksort(A, i, k):
  if i < k:
    p := partition(A, i, k)
    if(p-i > k-p)
    /*difference from start point to pivot is greater than
   difference from pivot to end point*/
        quicksort(A, p + 1, k)
        quicksort(A, i, p - 1)
   else
        quicksort(A, i, p - 1)
        quicksort(A, p + 1, k)

I profiled code like this written in Java and it does appear to be faster but why? At first I thought it had to do with tail recursion optimization but then realized that was definilty wrong as Java doesn't support it.

Why is sorting the code in the smaller subarray first faster? This article claims it should be

Upvotes: 6

Views: 1298

Answers (3)

Adrian McCarthy
Adrian McCarthy

Reputation: 47992

I watched a Leslie Lamport talk recently in which he pointed out that Quicksort, as originally presented, is not necessarily a recursive algorithm, even though many programmers choose to implement it recursively.

Instead of recursing into the partitions, you can simply add them to a queue of ranges that still must be sorted. The algorithm iterates until the queue is empty. Whether you push and pull the range from the head or the tail determines whether you progress depth-first or breadth-first.

quicksort(a, begin, end) {
  queue<range> q;
  push q(range(begin, end));
  while (!q.empty()) {
    range = q.pop_front();
    pivot = partition(a, range.begin, range.end);
    if (pivot > range.begin) q.push_back(range(range.begin, mid));
    if (pivot + 1 < range.end) q.push_back(range(mid + 1, range.end));
  }
}

A recursive implementation just uses the stack as a LIFO queue, and therefore naturally proceeds in a depth-first manner, and I don't really think it matters which side you process next. The queue length is still bounded by the recursive depth.

But if you're working in a breadth first-manner, by using a FIFO queue as shown in the pseudo-code, then the order can matter. The larger partition will add more subranges to the queue, so you'd rather do that side once the queue is already as small as possible. To make the queue smaller, you want to process the smaller subrange first.

Upvotes: 0

user2566092
user2566092

Reputation: 4661

Maybe I'm missing something subtle here, but if you recurse on your two cases in a different order then you're just traversing the recursion tree depth-first after potentially performing some swaps of children sub-trees at each node but it's still isomorphic to the original recursion tree and the set of all recursion depths for base cases will still be the same. The only way I can see getting demonstrable recursion depth reduction or other kind of recursion-related speed-up is doing something like you recurse on the smaller subarray and then pick a pivot for the larger subarray (without recursing) and then recurse on the two subcases for the larger subarray. This will turn your recursion tree into a ternary recursion tree instead of binary, that should typically have lower maximum depth.

Upvotes: 1

Pauli Nieminen
Pauli Nieminen

Reputation: 1120

Java JIT compiler can eliminate tail recursion which leads to reduced stack memory use. Reduced stack memory may reduces cache pressure allowing larger partition size to fit to faster cache level.

Other minor improvement is reduced number of function calls. Those there is minor reduction to number of instruction executed. But the instruction count reduction has usually very low correlation to performance when operating on data that doesn't fit to cache.

Upvotes: 0

Related Questions