Reputation: 4509
In the critical path of my program, I need to sort an array (specifically, a C++ std::vector<int64_t>, using the gnu c++ standard libray). I am using the standard library provided sorting algorithm (std::sort), which in this case is introsort.
I was curious about how well this algorithm performs, and when doing some research on various sorting algorithms different standard and third party libraries use, almost all of them care about cases where 'n' tends to be the dominant factor.
In my specific case though, 'n' is going to be on the order of 2-20 elements. So the constant factors could actually be dominant. And things like cache effects might be very different when the entire array we are sorting fits into a couple of cache lines.
What are the best sorting algorithms for cases like this where the constant factors likely overwhelm the asymptotic factors? And do there exist any vetted C++ implementations of these algorithms?
Upvotes: 4
Views: 1722
Reputation: 12332
Watch https://www.youtube.com/watch?v=FJJTYQYB1JQ
A simple linear insertion sort is really fast. Making a heap first can improve it a bit.
Sadly the talk doesn't compare that against the hardcoded solutions for <= 15 elements.
Upvotes: 2
Reputation: 59174
Introsort takes your concern into account, and switches to an insertion sort implementation for short sequences.
Since your STL already provides it, you should probably use that.
Upvotes: 5
Reputation: 61519
It's impossible to know the fastest way to do anything without knowing exactly what the "anything" is.
Here is one possible set of assumptions:
With that set of assumptions (and possibly some other sets), the best algorithms for small numbers of elements will be hand-crafted sorting networks, tailored to the exact length of the input array. (These always perform the same number of comparisons; it isn't feasible to "short-circuit" these algorithms conditionally because the "conditions" would depend on detecting data that is already partially sorted, which still requires comparisons.)
For a network sorting four elements (in the known-optimal five comparisons), this might look like (I did not test this):
template<class RandomIt, class Compare>
void _compare_and_swap(RandomIt first, Compare comp, int x, int y) {
if (comp(first[x], first[y])) {
auto tmp = first[x];
arr[x] = arr[y];
arr[y] = tmp;
}
}
// Assume there are exactly four elements available at the `first` iterator.
template<class RandomIt, class Compare>
void network_sort_4(RandomIt first, Compare comp) {
_compare_and_swap(2, 0);
_compare_and_swap(1, 3);
_compare_and_swap(0, 1);
_compare_and_swap(2, 3);
_compare_and_swap(1, 2);
}
In real-world environments, of course, we will have different assumptions. For small numbers of elements, with real data (but still assuming we must do comparison-based sorts) it will be difficult to beat naive implementations of insertion sort (or bubble sort, which is effectively the same thing) that have been compiled with good optimizations. It's really not feasible to reason about these things by hand, considering both the complexity of the hardware level (e.g. the steps it takes to pipeline instructions and then compensate for branch mis-predictions) and the software level (e.g. the relative cost of performing the swap vs. performing the comparison, and the effect that has on the constant-factor analysis of performance).
Upvotes: 1
Reputation: 480
Insertion sort or selection sort are both typically faster for small arrays (i.e., fewer than 10-20 elements).
Upvotes: 2