Reputation: 3872

Looking for a sort algorithm with as few as possible compare operations

I want to sort items where the comparison is performed by humans:

Pictures
Priority of work items
...

For these tasks the number of comparisons is the limiting factor for performance.

What is the minimum number of comparisons needed (I assume > N for N items)?
Which algorithm guarantees this minimum number?

Upvotes: 13

Answers (12)

kuzzooroo

Reputation: 7408

https://stackoverflow.com/a/53979250/2829764 points to Merge-insertion sort, as "perhaps still the best freely-documented sorting algorithm for minimal comparisons."

The answer also notes

Glenn K. Manacher's algorithm “and later record-breaking sorting algorithms” (none of which, unfortunately, seem to be freely documented at this time) are reported to adapt the merge-insertion sort algorithm to offer even fewer comparisons.

However, you might consider how many things you're sorting. In your case, it's probably not too many. The asymptotic results in books may not apply. Instead you (or whoever is reading this answer years later) might want to generate random permutations of realistic size for your problem and see how many comparisons it takes to sort them. For 500 items either Ford-Johnson or a traditional merge sort are already 3,800 comparisons. That's a lot for a human.

Upvotes: 0

supercat

Reputation: 81307

If comparisons are expensive relative to book-keeping costs, you might try the following algorithm which I call "tournament sort". First, some definitions:

Every node has a numeric "score" property (which must be able to hold values from 1 to the number of nodes), and a "last-beat" and "fellow-loser" properties, which must be able to hold node references.
A node is "better" than another node if it should be output before the other.
An element is considered "eligible" if there are no elements known to be better than it which have been output, and "ineligible" if any element which has not been output is known to be better than it.
The "score" of a node is the number of nodes it's known to be better than, plus one.

To run the algorithm, initially assign every node a score of 1. Repeatedly compare the two lowest-scoring eligible nodes; after each comparison, mark the loser "ineligible", and add the loser's score to the winner's (the loser's score is unaltered). Set the loser's "fellow loser" property to the winner's "last-beat", and the winner's "last-beat" property to the loser. Iterate this until only one eligible node remains. Output that node, and make eligible all nodes the winner beat (using the winner's "last-beat" and the chain of "fellow-loser" properties). Then continue the algorithm on the remaining nodes.

The number of comparisons with 1,000,000 items was slightly lower than that of a stock library implementation of Quicksort; I'm not sure how the algorithm would compare against a more modern version of QuickSort. Bookkeeping costs are significant, but if comparisons are sufficiently expensive the savings could possibly be worth it. One interesting feature of this algorithm is that it will only perform comparisons relevant to determining the next node to be output; I know of no other algorithm with that feature.

Upvotes: 2

Beth

Reputation: 4670

To answer this, we need to make a lot of assumptions.

Let's assume we are sorting pictures by cuteness. The goal is to get the maximum usable information from the human in the least amount of time. This interaction will dominate all other computation, so it's the only one that counts.

As someone else mentioned, humans can deal well with ordering several items in one interaction. Let's say we can get eight items in relative order per round.

Each round introduces seven edges into a directed graph where the nodes are the pictures. If node A is reachable from node B, then node A is cuter than node B. Keep this graph in mind.

Now, let me tell you about a problem the Navy and the Air Force solve differently. They both want to get a group of people in height order and quickly. The Navy tells people to get in line, then if you're shorter than the guy in front of you, switch places, and repeat until done. In the worst case, it's N*N comparison.

The Air Force tells people to stand in a square grid. They shuffle front-to-back on sqrt(N) people, which means worst case sqrt(N)*sqrt(N) == N comparisons. However, the people are only sorted along one dimension. So therefore, the people face left, then do the same shuffle again. Now we're up to 2*N comparisons, and the sort is still imperfect but it's good enough for government work. There's a short corner, a tall corner opposite, and a clear diagonal height gradient.

You can see how the Air Force method gets results in less time if you don't care about perfection. You can also see how to get the perfection effectively. You already know that the very shortest and very longest men are in two corners. The second-shortest might be behind or beside the shortest, the third shortest might be behind or beside him. In general, someone's height rank is also his maximum possible Manhattan distance from the short corner.

Looking back at the graph analogy, the eight nodes to present each round are eight of those with the currently most common length of longest inbound path. The length of the longest inbound path also represents the node's minimum possible sorted rank.

You'll use a lot of CPU following this plan, but you will make the best possible use of your human resources.

Upvotes: 8

Cybergibbons

Reputation: 464

The questions raises more questions really.

Are we talking a single human performing the comparisons? It's a very different challenge if you are talking a group of humans trying to arrange objects in order.

What about the questions of trust and error? Not everyone can be trusted or to get everything right - certain sorts would go catastrophically wrong if at any given point you provided the wrong answer to a single comparison.

What about subjectivity? "Rank these pictures in order of cuteness". Once you get to this point, it could get really complex. As someone else mentions, something like "hot or not" is the simplest conceptually, but isn't very efficient. At it's most complex, I'd say that google is a way of sorting objects into an order, where the search engine is inferring the comparisons made by humans.

Upvotes: 1

ConcernedOfTunbridgeWells

Reputation: 66702

From an assignment I once did on this very subject ...

The comparison counts are for various sorting algorithms operating on data in a random order

Size      QkSort    HpSort   MrgSort     ModQk   InsrtSort
  2500     31388     48792     25105     27646     1554230
  5000     67818    107632     55216     65706     6082243
 10000    153838    235641    120394    141623    25430257
 20000    320535    510824    260995    300319   100361684
 40000    759202   1101835    561676    685937
 80000   1561245   2363171   1203335   1438017
160000   3295500   5045861   2567554   3047186

These comparison counts are for various sorting algorithms operating on data that is started 'nearly sorted'. Amongst other things it shows a the pathological case of quicksort.

Size      QkSort    HpSort   MrgSort     ModQk   InsrtSort
  2500     72029     46428     16001     70618      76050
  5000    181370    102934     34503    190391    3016042
 10000    383228    226223     74006    303128   12793735
 20000    940771    491648    158015    744557   50456526
 40000   2208720   1065689    336031   1634659  
 80000   4669465   2289350    712062   3820384
160000  11748287   4878598   1504127  10173850

From this we can see that merge sort is the best by number of comparisons.

I can't remember what the modifications to the quick sort algorithm were, but I believe it was something that used insertion sorts once the individual chunks got down to a certain size. This sort of thing is commonly done to optimise quicksort.

You might also want to look up Tadao Takaoka's 'Minimal Merge Sort', which is a more efficient version of the merge sort.

Upvotes: 5

James Anderson

Reputation: 27486

Merge sort is definately the way to go here as you can use a Map/Reduce type algorithm to have several humans doing the comparisons in parallel.

Quicksort is essentially a single threaded sort algorithm.

You could also tweak the merge sort algorithm so that instead of comparing two objects you present your human with a list of say five items and ask him or her to rank them.

Another possibility would be to use a ranking system as used by the famous "Hot or Not" web site. This requires many many more comparisons, but, the comparisons can happen in any sequence and in parallel, this would work faster than a classic sort provided you have enough huminoids at your disposal.

Upvotes: 1

Alex Gartrell

Reputation: 2564

People are really good at ordering 5-10 things from best to worst and come up with more consistent results when doing so. I think trying to apply a classical sorting algo might not work here because of the typically human multi-compare approach.

I'd argue that you should have a round robin type approach and try to bucket things into their most consistent groups each time. Each iteration would only make the result more certain.

It'd be interesting to write too :)

Upvotes: 3

Erich Kitzmueller

Reputation: 36987

You should consider that humans might make non-transitive comparisons, e.g. they favor A over B, B over C but also C over A. So when choosing your sort algorithm, make sure it doesn't completely break when that happens.

Upvotes: 3

kgiannakakis

Reputation: 104198

Here is a comparison of algorithms. The two better candidates are Quick Sort and Merge Sort. Quick Sort is in general better, but has a worse worst case performance.

Upvotes: 1

Jaelebi

Reputation: 6129

The best one would be the merge sort

The minimum run time is n*log(n) [Base 2] The way it is implemented is

If the list is of length 0 or 1, then it is already sorted.

Otherwise:

Divide the unsorted list into two sublists of about half the size.

Sort each sublist recursively by re-applying merge sort.

Merge the two sublists back into one sorted list.

Upvotes: 0

SmacL

Reputation: 22932

Pigeon hole sorting is order N and works well with humans if the data can be pigeon holed. A good example would be counting votes in an election.

Upvotes: 4

Jon Skeet

Reputation: 1503419

I don't think you're likely to get a better answer than the Wikipedia page on sorting.

Summary:

For arbitrary comparisons (where you can't use something like radix sorting) the best you can achieve is O(n log n)
Various algorithms achieve this - see the "comparison of algorithms" section.
The commonly used QuickSort is O(n log n) in a typical case, but O(n^2) in the worst case; there are often ways to avoid this, but if you're really worried about the cost of comparisons, I'd go with something like MergeSort or a HeapSort. It partly depends on your existing data structures.

If humans are doing the comparisons, are they also doing the sorting? Do you have a fixed data structure you need to use, or could you effectively create a copy using a balanced binary tree insertion sort? What are the storage requirements?

Upvotes: 1

Looking for a sort algorithm with as few as possible compare operations

Answers (12)

Related Questions