Reputation: 52987
Most of the time, we use built-in libraries for sorting, which are generic. But most of the times, too, we are sorting based on numeric indexes or other values that can be translated in indices. If I'm not mistaken, sorting numbers is O(n). So why aren't we ever using numeric sorting algorithms at all?
Upvotes: 1
Views: 300
Reputation: 55609
I'm not really sure (single) integers (or floating points for that matter, though most numeric sorts require / are efficient for integers) are what is being sorted 'most of the time', thus having some algorithm that only works on integers doesn't seem particularly useful. I say 'single' integers as opposed to (strings or) objects (or equivalent) that contain multiple integers, numbers, strings or whatever else.
Not to mention that (I believe) the bottleneck of any real-world program (that's primary purpose is more than just sorting data) (well, most of them) should not be sorting 'single' numbers using an O(n log n)
sort. You're probably way better off changing the way your data is represented to remove the need for the sort rather than cutting down on the log n
factor.
It's a common misperception, but no sorting algorithm (numeric or otherwise) is actually worst-case O(n)
. There's always some additional parameter that comes into play. For radix sort, the length of the numbers is the determining factor. For long numbers in short arrays, this length can easily be more than log n
, resulting in worse performance than a O(n log n)
sort (see below test).
Now numeric sorts are useful and way better than any comparison-based sorting algorithm given that your data conforms to specific constraints most (but not all) of the time (by looking at the complexity given by any decent reference, you should easily see what determines whether it will be good - e.g. O(kN) implies long numbers might cause it to take a bit longer, things like dealing with duplicates well is a bit more subtle).
Without extensive real-world experience / theoretical knowledge, you're unlikely to pick the most efficient algorithm it's entirely possible that you'll find yourself with a problem where the chosen algorithm, which should be awesome in theory severely under-performs a standard algorithm for your data, because of some subtle factor.
So standard libraries don't put you in the position to pick an incorrect sort and possibly have terrible performance because your data doesn't conform to some constraints. Library sorts tend to be decent all-round, but aren't specialized to specific data sets. Though I'm sure there are also libraries that focus on sorting algorithms, allowing you to pick from an extensive range of algorithms, but your average Joe the programmer probably doesn't want to / shouldn't be exposed to this choice.
Also note, while they aren't commonly included in libraries, it should be easy enough to find / write an implementation of whichever (popular) sort you wish to use ... which you should then benchmark against library sorts on a sufficient sample of your data before committing to it.
This is by no means intended to be a conclusive, 100% correct test with the best implementations of radix sort and quick sort to ever see the light of day. It's more to show that what the data looks like plays a large role in the performance of any given algorithm.
This is the only decent benchmark including radix-sort I could find in a few minutes of searching.
I ran the code and found this: (number range 0-2147483646)
(the time unit is related to nano-seconds, which doesn't really translate to seconds)
ArraySize Radix Quick
10 1889 126
100 2871 2702
1000 18227 38075
10000 360623 484128
100000 2306284 6029230
Quick-sort is faster for a large range of numbers and arrays of less than size 100 (exactly what I was saying above). Interesting but nothing really amazing about it. I mean who cares about the performance of sorting less than 100 numbers?
However, look what happened when I changed the number range to 0-99:
ArraySize Radix Quick
10 1937 121
100 8932 2022
1000 29513 14824
10000 236669 125926
100000 2393641 1225715
Quick-sort is consistently around 2x faster than Radix-sort for reasonably-sized arrays (1000-100000 elements).
You must be thinking - "What in the world? I thought radix sort was supposed to be good at these. I mean ... there's only 2 digits. And why is Quick-sort so much faster than in the above case?" Exactly. That's where "extensive real-world experience / theoretical knowledge" comes in. I suspect it relates to how well each algorithm / implementation deals with duplicates. But half of that could be because I might not have optimized the radix sort implementation for the smaller range (didn't know we do that? Well, that's another reason against trying to have a generic radix sort in a library)
Now 0-99 is probably not your typical data set either, and, overall, radix-sort is probably still better, but what you need to take away from all of this:
There's about a gazillion sorting algorithms. They vary greatly in what they're good at. Don't expect a standard library to give you a function for each. Comparison-based sorts can sort any comparable data types (and are fast enough for most practical applications) as opposed to numeric sorts which can only sort numbers. Thus having a single (or 2, as Java has) comparison-based sort in your (as in you, the person who wrote it) library is preferred.
Upvotes: 3
Reputation: 14205
Basically, we use comparison-based sorting algorithms because it's easier. Being able to supply a comparison function and get your data sorted is a huge win from an engineering perspective, even if you pay for it with a speed hit.
Keep in mind that the O(n log n) comparison-based sorting bound counts comparisons, not total runtime. If you're sorting strings, for instance, comparison can take time linear in the lengths of the strings being compared.
A common misconception (that I see echoed in the other answer) is that comparison-based sorting winds up having faster asymptotic complexity when you're sorting a moderate number of long numbers; say they're k bytes each. This simply isn't true; you do about n log(n) number comparisons, each of which takes O(k) time, for an overall complexity of O(k n log n). This is worse than O(k n).
Engineering a fast radix sort is a little harder than the theory says. While theory dictates that you should choose as large a radix as possible, there is a tradeoff between the radix you choose and the locality you achieve when partitioning the input stream. A bigger radix means fewer passes but also less local use of memory.
Upvotes: 1