Reputation: 1316
Im solving the QuickSort assignment at Algorithms class by Stanford and using the median rule to select the pivot element. The input is numbers from 1-10000 and output is the number of comparisons
My function are as follows :
public static int noOfComp = 0;
public static void quick_sort(int[] a, int p, int r){
if(p<r) {
noOfComp+= r-p;
int mid = partition(a, p, r);
quick_sort(a, p, mid-1);
quick_sort(a, mid+1, r);
}
}
public static int median(int a[],int p, int r){
int firstPos = p;
int len = r-p+1;
int lastPos = r;
int midPos = len%2==0 ? p + (len)/2-1: p + (len)/2 ;
int first = a[firstPos];
int middle = a[midPos];
int last = a[lastPos];
if (first <= middle) {
if (middle <= last) {
// first - middle - last
return midPos;
} else if (first <= last) {
// first - last - middle
return lastPos;
}
// last - first - middle
return firstPos;
}
if (first <= last) {
// middle - first - last
return firstPos;
} else if (middle <= last) {
// middle - last - first
return lastPos;
}
// last - middle - first
return midPos;
}
public static int partition(int[] a, int p, int r){
int chosen = median(a,p,r);
swap(a, p, chosen);
int pivot = a[p];
int i = p;
for (int j = p+1; j < a.length; j++) {
if (a[j] < pivot) {
i++;
swap(a, i, j);
}
}
swap(a, i,p);
return i;
}
//main
public static void main(String[] args) throws Throwable{
int i=0;
Scanner in = new Scanner(new File("C:\\Users\\Uzumaki Naruto\\Documents\\QuickSort.txt"));
while(in.hasNext()){
i++;
in.next();
}
int[] a = new int[i];
i=0;
Scanner in2 = new Scanner(new File("C:\\Users\\Uzumaki Naruto\\Documents\\QuickSort.txt"));
while(in2.hasNext()){
a[i++] = in2.nextInt();
}
quick_sort(a, 0, a.length-1);
System.out.println("Number of comparisons : " + noOfComp);
}
The answer to question seems to be around 128k , but my algorithm output it 132k. I've read the code number of times but unable to ascertain the error.
Upvotes: 0
Views: 119
Reputation: 350715
Indeed, I also get an average count of around 132k with your code, executed on randomly shuffled arrays of unique numbers. I did not find any mistake in the algorithm, except for the following one, but it's not influencing your count result, which assumed correct code:
The loop in partition has a bad exit condition:
for (int j = p+1; j < a.length; j++) {
It should be:
for (int j = p+1; j <= r; j++) {
The following is not an error, but you can rewrite
int len = r-p+1;
int midPos = len%2==0 ? p + (len)/2-1: p + (len)/2 ;
as:
int midPos = p + (r-p)/2;
But: You did not count the comparisons made in the function median, and this should normally be done, otherwise an algorithm cannot be fairly compared with another (variant). So that results in 2 or 3 more comparisons per call of partition. This increases the average count to around 148k!
Here it says that:
the expected number of comparisons needed to sort n elements with random pivot selection is 1.386 n.log(n). Median-of-three pivoting brings this down to ≈ 1.188 n.log(n).
The thing is that for n = 10 000, 1.188 n.log(n) ≈ 158k so your algorithm seems to do fewer comparisons than this estimate, at least for this particular case of n.
I do see a way to reduce that number again.
The main idea is to profit from the comparisons you make in the function median by already putting the lowest and highest of the three inspected values in the right partition, so they do not need to be treated further by the loop in the function partition.
To give an example, if you have an array like this:
5, 1, 2, 9, 3
Then median will compare 5, 2 and 3 and choose 3 as pivot value. The function could now be extended to also put the three investigated elements in the right order, without extra comparisons, to get this:
2, 1, 3*, 9, 5
And then the pivot element would not have to be swapped to the start of the array, but to the second slot, because we already have decided that the left most element belongs to the lower partition:
2, 3*, 1, 0, 5
And now the main partition loop can concentrate on this sub-array, because also the last element is known the belong to the upper partition:
2, 3*, [1, 0], 5
At the end of the loop the final swap will be with the second element instead of the first:
2, 0, 1, 3*, 5
This will reduce the number of comparisons in the main loop with 2.
In this variant, the median function will always return the index of the second slot, after making a few swaps in the array:
public static int median(int a[],int p, int r){
int m = p + (r-p)/2;
// actually sort the three elements:
noOfComp++;
if (a[r] < a[m]) {
swap(a, r, m);
}
if (p < m) { // more than 2 elements
noOfComp++;
if (a[m] < a[p]) {
swap(a, m, p);
noOfComp++;
if (a[r] < a[m]) {
swap(a, r, m);
}
}
// put the middle element (pivot) in second slot
swap(a, m, p+1);
}
return p+1;
}
And partition will look like this:
public static int partition(int[] a, int p, int r){
int k = median(a, p, r); // always returns p+1 as pivot's index
int i = k; // (k..i] is lower partition
for (int j = p+2; j < r; j++) { // positions p and r can be excluded
if (a[j] < a[k]) {
i++;
swap(a, i, j);
}
}
swap(a, i, k); // place pivot between partitions
return i;
}
In quick_sort the count of comparisons will be two less:
noOfComp += r-p-2;
With the above adjustments the number of comparisons goes down from 148k to 135k on average.
So I am afraid that although the actual number of comparisons has been reduced this way, it still does not match the 128k.
I tried using insertion sort when the array became small, but it did not yield much of an improvement. Another idea is to improve the search for the median by looking at more elements, but only if the array is not too small, as the cost of looking for one must be small compared to the partitioning effort.
But the assignment may not allow for all this tweaking.
Upvotes: 1