Reputation: 2179
I was just reading the following question: Radix sort most significant first or least significant, which is faster?
And the author of the accepted answer was suggesting that the MSD radix sort is indeed faster. I do not see why however.
I have implemented both LSD and MSD (binary based by doing shift operations), LSD is iterative, requires just one bucket array, while MSD is recursive and requires one individual bucket array per every recursion call.
If you create a random array of 10 million integers, I cannot see how MSD will be faster than LSD, since you will be allocating extra bucket arrays every time you re enter your function and also you will have to face the overhead of the recursion call themselves.
I can see how the combination of MSD and LSD can give an over all boost (run MSD for the first few bits and LSD for the rest of the bits to reduce cache misses), but how is MSD alone expected to be more efficient than LSD given its recursive nature and the fact that you need a new bucket array for every recursion call, unlike LSD which is both iterative and just requires one bucket array for the entire sorting procedure?
Upvotes: 5
Views: 3894
Reputation: 3370
The number of iterations in MSD radix depends on the input size, whereas the number of iterations in LSD radix sort depends on the key length. This often leads to MSD radix sort requiring significantly fewer iterations than LSD radix sort and is therefore faster.
Memory allocations are not an issue, as MSD radix sort can easily be implemented in-place.
I've made an implementation for both LSD and MSD radix sort, so I could see what properties they have that makes MSD radix sort faster than LSD radix sort.
I've compared their speeds with std::sort on an array of 100.000.000 random positive 63-bit integers (I used std::sort's result I also used for verifying the sorted arrays) and got the following results:
So, it is slightly faster than std::sort, and if the leaves are sorted with insertion_sort, it is quite a bit faster.
I believe that this last point is the reason why MSD radix sort is often faster than LSD radixsort. If the input data is uniformly random distributed, then the expected running time is O(n log(n)/log(d)), whereas LSD radix sort's running time is O(n k). And usually n is a lot smaller than k^d. Only if n = o(k^d), LSD radix sort would be faster. However, in that case counting sort (radix sort with k=1) can be used as well.
inline void insertion_sort(int64_t * array, int n) {
for (int i=1; i<n; i++) {
int64_t val = array[i];
int j = i;
while (j>0 && array[j-1] > val) {
array[j] = array[j-1];
j--;
}
array[j] = val;
}
}
void msd_sort(int64_t * array, int n, int64_t bit=60) {
const int64_t mask = INT64_C(7);
// Count bucket sizes
int count[9]={};
for (int i=0; i<n; i++) {
count[((array[i]>>bit) & mask)+1]++;
}
// Create holes.
int loc[8];
int64_t unsorted[8];
int live = 0;
for (int i=0; i<8; i++) {
loc[i] = count[i];
count[i+1]+=count[i];
unsorted[live] = array[loc[i]];
if (loc[i] < count[i+1]) {
live++;
}
}
live--;
// Perform sort
for (int i=0; i<n; i++) {
int64_t val = unsorted[live];
int64_t d = (val>>bit) & mask;
array[loc[d]] = val;
loc[d]++;
unsorted[live] = array[loc[d]];
if (loc[d] == count[d+1]) {
live--;
}
}
if (bit>0) {
for (int i=0; i<8; i++) {
n = count[i+1] - count[i];
if (n > 20) { // If replaced by n > 1, insertion_sort is not needed.
msd_sort(array + count[i], n, bit-3);
} else {
insertion_sort(array + count[i], n);
}
}
}
}
void lsd_sort(int64_t * array, int n) {
const int64_t mask = INT64_C(7);
std::vector<int64_t> buffer(n);
for (int64_t bit=0; bit<63; bit+=3) {
// Copy and count
int count[9]={};
for (int i=0; i<n; i++) {
buffer[i] = array[i];
count[((array[i]>>bit) & mask) + 1]++;
}
// Init writer positions
for (int i=0; i<8; i++) {
count[i+1]+=count[i];
}
// Perform sort
for (int i=0; i<n; i++) {
int64_t val = buffer[i];
int64_t d = (val>>bit) & mask;
array[count[d]] = val;
count[d]++;
}
}
}
Upvotes: 7
Reputation: 21778
The question you are refering to is performing the sort on a single bit only. For that reason, each recursion splits the array into two groups only. Thus, there is not much that you actually have to store when recursing.
The smaller group you descend - the bigger group, you put on a stack for a later execution. In the worst case scenario, you will have at most w
elements in the stack, where w
is the width (in bits) of your data type.
Now, having shown that recursion is not that bad in this case, the argumentation of Niklas B. why MSD has a chance to run faster (i.e. cache locality) stands.
Upvotes: 1