Celeritas
Celeritas

Reputation: 15063

using insertion sort once in quicksort

According to here

Use insertion sort...for invocations on small arrays (i.e. where the length is less than a threshold k determined experimentally). This can be implemented by simply stopping the recursion when less than k elements are left, leaving the entire array k-sorted: each element will be at most k positions away from its final position. Then, a single insertion sort pass finishes the sort in O(k×n) time.

I'm not sure I'm understanding correctly. One way to do it that involves calling insertion sort multiple times is

quicksort(A, i, k):
  if i+threshold < k:
    p := partition(A, i, k)
    quicksort(A, i, p - 1)
    quicksort(A, p + 1, k)
  else
    insertionsort(A, i, k)

but this would call insertionsort() for each subarray. It sounds like insertion sort could be called only once, but I sort of don't understand this because it doesn't matter how many times insertion sort is called it's still generally slower than quicksort.

Is the idea like this?

sort(A)
 quicksort(A, 0, A.length-1)
 insertionsort(A, 0, A.length-1)

So basically call insertion sort once at the very end? How do you know it would only take one pass and not run at O(n)?

Upvotes: 0

Views: 917

Answers (1)

David Eisenstat
David Eisenstat

Reputation: 65488

Yes, your second pseudocode is correct. The usual analysis of insertion sort is that the outer loop inserts each element in turn (O(n) iterations), and the inner loop moves that element to its correct place (O(n) iterations), for a total of O(n^2). However, since your second quicksort leaves an array that can be sorted by permuting elements within blocks of size at most threshold, each element moves at most threshold positions, and the new analysis is O(n*threshold), which is equivalent to running insertion sort on each block separately.

Described by Bentley in the 1999 edition of Programming Pearls, this idea (per Wikipedia) avoids the overhead of starting and stopping the insertion sort loop many times (in essence, we have a natural sentinel value for the insertion loop already in the array). IMHO, it's a cute idea but not clearly still a good one given how different the performance characteristics of commodity hardware are now (specifically, the final insertion sort requires another pass over the data, which has gotten relatively more expensive, and the cost of starting the loop (moving some values in registers) has gotten relatively less expensive).

Upvotes: 1

Related Questions