Reputation: 11431
I am reading a quick sort implementation using a stack at the following link.
My question is regarding the following paragraph.
The policy of putting the larger of the small subfiles on the stack ensures that each entry on the stack is no more than one-half of the size of the one below it, so that the stack needs to contain room for only about lg N entries. This maximum stack usage occurs when the partition always falls at the center of the file. For random files, the actual maximum stack size is much lower; for degenerate files it is likely to be small.
This technique does not necessarily work in a truly recursive implementation, because it depends on end- or tail-recursion removal. If the last action of a procedure is to call another procedure, some programming environments will arrange things such that local variables are cleared from the stack before, rather than after, the call. Without end-recursion removal, we cannot guarantee that the stack size will be small for quicksort.
What does the author mean by "that each entry on the stack is no more than one-half of the size of the one below it"? Could you please give an example of this.
How did the author came to the conclusion that the stack needs space for only about lg N
entries?
What does authore mean by "Without end-recursion removal, we cannot guarantee that the stack size will be small for quicksort" ?
Thanks for your time and help.
Upvotes: 1
Views: 3906
Reputation: 183868
The policy of putting the larger of the small subfiles on the stack ensures that each entry on the stack is no more than one-half of the size of the one below it,
That is not quite true. Consider you want to sort a 100-element array, and the first pivot goes right in the middle. Then you have a stack
49
50
then you pop the 49-element part off the stack, partition, and push the two parts on the stack. Let's say the choice of pivot was not quite as good this time, there were 20 elements not larger than the pivot. Then you'd get the stack
20
28
50
and each stack entry is more than half of the one below.
But that cannot continue forever, and we have
During the entire sorting, if stack level
k
is occupied, its size is at mosttotal_size / (2^k)
.
That is obviously true when the sorting begins, since then there is only one element on the stack, at level 0, which is the entire array of size total_size
.
Now, assume the stated property holds on entering the loop (while(!stack.empty())
).
A subarray of length s
is popped from stack level m
. If s <= 1
, nothing else is done before the next loop iteration, and the invariant continues to hold. Otherwise, if s >= 2
, After partitioning that, there are two new subarrays to be pushed on the stack, with s-1
elements together. The smaller of those two then has a size smaller_size <= (s-1)/2
, and the larger has a size larger_size <= s-1
. Stack level m
will be occupied by the larger of the two, and we have
larger_size <= s-1 < s <= total_size / (2^m)
smaller_size <= (s-1)/2 < s/2 <= total_size / (2^(m+1))
for the stack levels m
resp. m+1
at the end of the loop body. The invariant holds for the next iteration.
Since at most one subarray of size 0 is ever on the stack (it is then immediately popped off in the next iteration), there are never more than lg total_size + 1
stack levels occupied.
Regarding
What does author mean by "Without end-recursion removal, we cannot guarantee that the stack size will be small for quicksort" ?
In a recursive implementation, you can have deep recursion, and when the stack frame is not reused for the end-call, you may need linear stack space. Consider a stupid pivot selection, always choosing the first element as pivot, and an already sorted array.
[0,1,2,3,4]
partition, pivot goes in position 0, the smaller subarray is empty. The recursive call for the larger subarray [1,2,3,4]
, allocates a new stack frame (so there are now two stack frames). Same principle, the next recursive call with the subarray [2,3,4]
allocates a third stack frame, etc.
If one has end-recursion removal, i.e. the stack frame is reused, one has the same guarantees as with the manual stack above.
Upvotes: 1
Reputation: 1817
I will try to answer your question (hopefully I am not wrong)... Every step in quicksort you divide your input into two (one half). By doing so, you need logN. This explains your first and second question ("each entry on the stack is no more than one-half" and "logN" entries)
Upvotes: 0