Quicksort and tail recursive optimization

In Introduction to Algorithms p169 it talks about using tail recursion for Quicksort.

The original Quicksort algorithm earlier in the chapter is (in pseudo-code)

Quicksort(A, p, r)
{
 if (p < r)
 {
  q: <- Partition(A, p, r)
  Quicksort(A, p, q)
  Quicksort(A, q+1, r)
 }
}

The optimized version using tail recursion is as follows

Quicksort(A, p, r)
{
 while (p < r)
 {
  q: <- Partition(A, p, r)
  Quicksort(A, p, q)
  p: <- q+1
 }
}

Where Partition sorts the array according to a pivot.

The difference is that the second algorithm only calls Quicksort once to sort the LHS.

Can someone explain to me why the 1st algorithm could cause a stack overflow, whereas the second wouldn't? Or am I misunderstanding the book.

Upvotes: 21

Answers (6)

Sanpreet

Reputation: 103

In the function 2 that you shared, Tail Call elimination is implemented. Before proceeding further let us understand what is tail recursion function?. If the last statement in the code is the recursive call and does do anything after that, then it is called tail recursive function. So the first function is a tail recursion function. For such a function with some changes in the code one can remove the last recursion call like you showed in function 2 which performs the same work as function 1. This process is called tail recursion optimization or tail call elimination and following are the result of it

Optimizing in terms of auxiliary space
Optimizing in terms of recursion call overhead

Last recursive call is eliminated by using the while loop. The good thing is that for function 2, no auxiliary space is used for the right call as its recursion is eliminated using p: <- q+1 and the overall function does not have recursion call overhead. So whatever way partition happens maximum space needed is theta(log n)

Upvotes: 0

Pedrom

Reputation: 3823

First let's start with a brief, probably not accurate but still valid, definition of what stack overflow is.

As you probably know right now there are two different kind of memory which are implemented in too different data structures: Heap and Stack.

In terms of size, the Heap is bigger than the stack, and to keep it simple let's say that every time a function call is made a new environment(local variables, parameters, etc.) is created on the stack. So given that and the fact that stack's size is limited, if you make too many function calls you will run out of space hence you will have a stack overflow.

The problem with recursion is that, since you are creating at least one environment on the stack per iteration, then you would be occupying a lot of space in the limited stack very quickly, so stack overflow are commonly associated with recursion calls.

So there is this thing called Tail recursion call optimization that will reuse the same environment every time a recursion call is made and so the space occupied in the stack is constant, preventing the stack overflow issue.

Now, there are some rules in order to perform a tail call optimization. First, each call most be complete and by that I mean that the function should be able to give a result at any moment if you interrupts the execution, in SICP this is called an iterative process even when the function is recursive.

If you analyze your first example, you will see that each iteration is defined by two recursive calls, which means that if you stop the execution at any time you won't be able to give a partial result because you the result depends of those calls to be finished, in this scenario you can't reuse the stack environment because the total information is split between all those recursive calls.

However, the second example doesn't have that problem, A is constant and the state of p and r can be locally determined, so since all the information to keep going is there then TCO can be applied.

Upvotes: 11

saolof

Reputation: 1661

Tail recursion by itself is not enough. The algorithm with the while loop can still use O(N) stack space, reducing it to O(log(N)) is left as exercise in that section of CLRS.

Assume we are working in a language with array slices and tail call optimization. Consider the difference between these two algorithms:

Bad:

Quicksort(arraySlice) {
 if (arraySlice.length > 1) {
  slices = Partition(arraySlice)
  (smallerSlice, largerSlice) = sortBySize(slices)
  Quicksort(largerSlice) // Not a tail call, requires a stack frame until it returns. 
  Quicksort(smallerSlice) // Tail call, can replace the old stack frame.
 }
}

Good:

Quicksort(arraySlice) {
 if (arraySlice.length > 1){
  slices = Partition(arraySlice)
  (smallerSlice, largerSlice) = sortBySize(slices)
  Quicksort(smallerSlice) // Not a tail call, requires a stack frame until it returns. 
  Quicksort(largerSlice) // Tail call, can replace the old stack frame.
 }
}

The second one is guarenteed to never need more than log2(length) stack frames because smallerSlice is less than half as long as arraySlice. But for the first one, the inequality is reversed and it will always need more than or equal to log2(length) stack frames, and can require O(N) stack frames in the worst case where smallerslice always has length 1.

If you don't keep track of which slice is smaller or larger, you will have similar worst cases to the first overflowing case, even though it will require O(log(n)) stack frames on average. If you always sort the smaller slice first, you will never need more than log_2(length) stack frames.

If you are using a language that doesn't have tail call optimization, you can write the second (not stack-blowing) version as:

Quicksort(arraySlice) {
 while (arraySlice.length > 1) {
  slices = Partition(arraySlice)
  (smallerSlice, arraySlice) = sortBySize(slices)
  Quicksort(smallerSlice) // Still not a tail call, requires a stack frame until it returns. 
 }
}

Another thing worth noting is that if you are implementing something like Introsort which changes to Heapsort if the recursion depth exceeds some number proportional to log(N), you will never hit the O(N) worst case stack memory usage of quicksort, so you technically don't need to do this. Doing this optimization (popping smaller slices first) still improves the constant factor of the O(log(N)) though, so it is strongly recommended.

Upvotes: 8

Rohit

Reputation: 10246

Well If you consider the complexity of the two methods the first method obviously has more complexity than the second since it calls Recursion on both LHS and RHS as a result there are more chances of getting stack overflow

Note: That doesnt mean that there are absolutely no chances of getting SO in second method

Upvotes: 0

MK.

Reputation: 34587

The essence of the tail recursion optimization is that there is no recursion when the program is actually executed. When the compiler or interpreter is able to kick TRO in, it means that it will essentially figure out how to rewrite your recursively-defined algorithm into a simple iterative process with the stack not used to store nested function invocations.
The first code snippet can't be TR-optimized because there are 2 recursive calls in it.

Upvotes: 7