Reputation: 880
I found this code online while browsing around for different quicksort implementations:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Quicksort
{
class Program
{
static void Main(string[] args)
{
// Create an unsorted array of string elements
string[] unsorted = { "z","e","x","c","m","q","a"};
// Print the unsorted array
for (int i = 0; i < unsorted.Length; i++)
{
Console.Write(unsorted[i] + " ");
}
Console.WriteLine();
// Sort the array
Quicksort(unsorted, 0, unsorted.Length - 1);
// Print the sorted array
for (int i = 0; i < unsorted.Length; i++)
{
Console.Write(unsorted[i] + " ");
}
Console.WriteLine();
Console.ReadLine();
}
public static void Quicksort(IComparable[] elements, int left, int right)
{
int i = left, j = right;
IComparable pivot = elements[(left + right) / 2];
while (i <= j)
{
while (elements[i].CompareTo(pivot) < 0)
{
i++;
}
while (elements[j].CompareTo(pivot) > 0)
{
j--;
}
if (i <= j)
{
// Swap
IComparable tmp = elements[i];
elements[i] = elements[j];
elements[j] = tmp;
i++;
j--;
}
}
// Recursive calls
if (left < j)
{
Quicksort(elements, left, j);
}
if (i < right)
{
Quicksort(elements, i, right);
}
}
}
}
I understand how nearly all of it works but I was wondering why in the recursive calls they are using left, j and i, right for the high and low. I would think that you would want to use left, pivotIndex and pivotIndex, right. I tried this and it doesn't work but I don't really understand why. I have found a couple more in other languages that do it the same so I'm guessing it's correct (and it seems to work so that's also a good indicator that it works right). I also don't get why everyone seems to store the value of the pivot rather than the index of the pivot. If I modify this to use the index of the pivot rather than the value that also seems to work but it seems rather significant that they do this and many other implementation of quicksort do it the same way. Can someone please help me understand this?
Upvotes: 2
Views: 2118
Reputation: 6552
That code is actually a fairly clear implementation of quicksort. I worked one full summer on nothing but sorting algorithms, so this question caught my eye.
The key to understanding this particular program is that every recursive function call receives the same full string. So the function has to set its own boundaries, i.e., "left" and "right" and ignore the remainder of the string. Every recursive function call only deals with partitioning a small part of the string.
The original function call partitions the string into portions less than the pivot (lower portion) and greater than the pivot (upper portion), leaving any string values equal to the pivot in a "no man's land" somewhere in between the left and right portions.
As soon as it gets to the recursive function calls, it gets a bit more murky, and it is important to understand that these lower and upper portions are different from the "left" and "right" variables. The "left" variable is the original position of the left boundary of the entire string that the function call is supposed to operate on, and similarly for the "right" variable.
The key to the algorithm is understanding that the first function call leaves nothing necessarily sorted in the upper or lower portions. However, it removes one or more partition values from consideration and ensures that they are above all lower string values, and below all upper string values, i.e., in the "middle" of the string.
This process is repeated recursively on both of the unsorted upper and lower portions of the string until eventually reaching an upper or lower portion size of two or fewer values, which must be sorted already as a result of being partitioned.
The quicksort algorithm has a worst case performance of O(N^2) if the pivot value is always an extreme value, and a best case performance of O(N log N) if the pivot value is close to the middle of the string. For instance, when sorting 31 characters and assuming a perfect median choice of pivot,
first stage:
lower 15, pivot, upper 15
second stage:
(lower 7, pivot, upper 7), pivot, (lower 7, pivot, upper 7)
third stage:
((lower 3, pivot, upper 3), pivot, (lower 3, pivot, upper 3)), pivot, ((lower 3, pivot, upper 3), pivot, (lower 3, pivot, upper 3))
fourth stage:
(((lower 1, pivot, upper 1), pivot, (lower 1, pivot, upper 1)), pivot, ((lower 1, pivot, upper 1), pivot, (lower 1, pivot, upper 1))), pivot, (((lower 1, pivot, upper 1), pivot, (lower 1, pivot, upper 1)), pivot, ((lower 1, pivot, upper 1), pivot, (lower 1, pivot, upper 1)))
Four stages is less than or equal to the base-two logarithm of 31, and at every stage the algorithm is linear (although performed in multiple distinct recursive function calls at each level), so 31 steps are performed, i.e., N steps.
So the total order of the algorithm is 4 stages times 31, or approximately N times log N.
In reality, sometimes a pivot is not the median, and may even be an extreme value, in which case the "lower" or "upper" portions are empty. The algorithm might remove only one value from consideration at each stage, resulting in a total of N-1 stages to complete the algorithm. At every stage, there would be N steps of work, so the total order would be N times (N-1), which is O(N^2) since there is a product of N with N.
You can see a nested list of the sorting operations for 10 million items here:
http://www.myersdaily.org/joseph/unix/sort/in10.html
and read some observations I made about the quicksort algorithm on Unix here:
http://www.myersdaily.org/joseph/unix/sort.html
(which was improved by a negligible 7% by using distribution-free statistics to avoid steps on when "lucky" occasions presented themselves).
All in all, quicksort is a very good algorithm, and very hard to improve, as long as one's algorithm avoids deliberate pernicious cases which could otherwise be exploited to feed a known codebase with input which produces a pivot sequence of extreme values.
Upvotes: 5