I've been looking around trying to learn of any practical applications where sorting is needed and its efficiency matters, but couldn't find anything. The only examples I could find or think off either did not need total sorting (like when looking for 100 top results or for the median) or sorting efficiency was hardly important (like when sorting once a year a spreadsheet with student names or past transactions). When sorting web search results, only a few dozens of top ranked results need to be found and sorted, not all of the Internet, so classical sorting algorithms are not needed or practical. When sorting a spreadsheet, it hardly matters if it will be sorted by a triple-pivot Las Vegas randomised quicksort or by the insertion sort. Using sorted arrays as sets or associative arrays seems to be practically less efficient than using hash tables. So my question is: what are practical ("real-life") examples where a total sorting is required and its efficiency is a bottleneck ? I am particularly curious about applications for comparison sorting . Update. I've stumbled upon this phrase in lecture notes by Steven Skiena: Computers spend more time sorting than anything else, historically 25% on mainframes. With some details, that could make a perfect answer to my question. Where can I find the source for this statistics, ideally with some details about the kind and the application of sorting done by mainframes?

Reputation: 4071

Practical importance of efficient sorting algorithms

I've been looking around trying to learn of any practical applications where sorting is needed and its efficiency matters, but couldn't find anything.

The only examples I could find or think off either did not need total sorting (like when looking for 100 top results or for the median) or sorting efficiency was hardly important (like when sorting once a year a spreadsheet with student names or past transactions).

When sorting web search results, only a few dozens of top ranked results need to be found and sorted, not all of the Internet, so classical sorting algorithms are not needed or practical.
When sorting a spreadsheet, it hardly matters if it will be sorted by a triple-pivot Las Vegas randomised quicksort or by the insertion sort.
Using sorted arrays as sets or associative arrays seems to be practically less efficient than using hash tables.

So my question is: what are practical ("real-life") examples where a total sorting is required and its efficiency is a bottleneck? I am particularly curious about applications for comparison sorting.

Update.

I've stumbled upon this phrase in lecture notes by Steven Skiena:

Computers spend more time sorting than anything else, historically 25% on mainframes.

With some details, that could make a perfect answer to my question. Where can I find the source for this statistics, ideally with some details about the kind and the application of sorting done by mainframes?

Upvotes: 0

Answers (2)

Jim Mischel

Reputation: 134125

Imagine you have a daily list of transactions (deposits and withdrawls) for bank accounts. There are millions of accounts, and millions of transactions per day. Each night, you have to update the accounts to reflect those transactions, and compute the interest accrued that day, and print a report, ordered by account, that shows each account with its daily activity.

One way to do that is to go through the list sequentially, reading a transaction and updating the account in the database. That will work, but it has several drawbacks, including:

If there are many transactions for a single account, you pay the price of retrieving and updating the account for every transaction. Considering that a business account can have thousands of transactions per day, those costs add up.
The typical rule is that deposits are recorded before withdrawals, so as to prevent an overdraft. If an account's balance is 0, and the transactions list has a withdrawal of $5 ahead of a $10 deposit, the system will record an overdraft when it shouldn't.
Printing the report would require a separate scan of the database, after all transactions are recorded.

The solution to those problems is to sort the transactions list by account and type (deposits first). Then, the update is a simple merge operation. You read the database and the transactions list in account number order, apply any transactions for that account, compute interest, print the output line, and write the updated record to the database.

The result is much faster than doing a read-update-write for every single transaction, and it eliminates the problems #2 and #3 I outlined above. Sort-and-merge makes the difference between the update taking all night, and the update taking a few hours.

Also, MapReduce (and Hadoop), used for processing big data, make good use of sorting. Those programming models simply would not be possible without high performance sorting algorithms.

Any time you need to merge multiple large data streams into a single output stream (and those applications are legion), the sort-and-merge approach is useful. There are times when other techniques might be faster, but the sort-and-merge is reliable and durable, and, as shown by MapReduce, scales well.

Upvotes: 1

Thomas

Reputation: 182063

In some graphics rendering algorithms, objects need to be drawn in back to front order. A good example is transparent particles: there can be hundreds of thousands of them, and because of the transparency, traditional depth buffering doesn't work. So you need to sort these particles by distance from the camera, and keep them sorted, at 60 frames per second.

Interestingly, if the order of the particles doesn't change much (relatively slow particle motion, little camera movement), then the array of particles will already be "mostly sorted" in the next frame, and a simple bubble sort or insertion sort can actually work fine. But on frames where many particles are created, or the camera moves quickly, sort performance can become important, simply because there are so many other things to do each frame.

Upvotes: 2

Practical importance of efficient sorting algorithms

Answers (2)

Related Questions