Reputation: 113976
In an application I'm doing, I am given a list of items, and relations between them and I need to calculate a sorted list. For fun, lets use an exam scoring example, we know that:
John has done better than Jane
Jane has done worse than Luke
Luke has done better than Jason
Jason has done worse than Tim
For this list, I need to calculate the following sorted list:
How do I derive an item's position in the sorted list, based on the contested relations?
Upvotes: 3
Views: 464
Reputation: 7798
Fundamentally, sorting require a linear order.
You need your relation to be a linear order, which means :
So, what does it mean? Most usual relations are linear. For example, if you are talking about grades, which are in fact real numbers, then yes, grades can be linearly ordered. Rankings too, as they again are only real numbers.
But for example, if you're ranking chess players, you cannot use their match history. If player A has beaten player B in the majority of their match, and B has beaten C again in the majority of their match, this does NOT mean that A > B > C, simply because C may have beaten A in the majority of their match.
This is by the way one of the reasons why Elo ratings exist, they needed a linear order to sort who's the best player, and notice how Elo is basically just a real number, thus has a linear order.
So, for exam scoring, you simply do NOT have that linear order. Why? It's real numbers! Well, true, you have this linear order, but you cannot ask this linear order everything, You're restricted to the predefined comparison list, which is here :
- John has done better than Jane
- Jane has done worse than Luke
- Luke has done better than Jason
- Jason has done worse than Tim
So, the answer is : No, in general, you can't sort a list if you do not have the comparison between any pair of elements.
Now, I'm working on a "least bad" answer, but it's not trivial to me...
Edit : OK, I came up with something. The idea is that you are going to extract the most possible information on your predefined comparison list. Then treat any comparison that is not in your extended comparison list as an equality.
So, how do you do it?
First, for any a < b comparison, add the entry b > a. Then, you will look through your predefined comparison list for any comparison a < b and b < c and add (if it's already not in here) all corresponding a < c (transitivity) and c > a. If you know there are no equalities (no students have the same grade), it's finished. Else, you have to add all the reflexivity and antisymmetry relations. You can't do anything about totality. Now, you have your extended comparison list.
Now, take your favourite sorting algorithm. If you don't have one, quicksort is nice, efficient and short to write. Then everytime the algorithm asks for a comparison between a and b, instead, look at your extended comparison list. If the comparison is here, great, else, treat the comparison as a=b. (Note that it doesn't matter if you know that there are no equalities, the algorithm doesn't care)
The result will be a "least badly" sorted list. There are usually several possible "least badly" sorted list, but this algorithm will give you just one.
So, let's give an example. It's nearly the same as the one given int the OP, with a slight difference ("John has done worse than Luke" instead of "Jane has done worse than Luke"). so, we start with :
Now, for every "X is worse than Y", I add "Y is better than X", and vice-versa. The new sentences are in bold :
Finally, I scan all the possible sentences "X has done better then Y" and "Y has done better than Z", and I add "X has done better than Z"
The extended table is finished.
Let's now take a look at the pseudocode of quicksort :
function quicksort('array')
if length('array') ≤ 1
return 'array' // an array of zero or one elements is already sorted
select and remove a pivot value 'pivot' from 'array'
create empty lists 'less' and 'greater'
for each 'x' in 'array'
if 'x' ≤ 'pivot' then append 'x' to 'less'
else append 'x' to 'greater'
return concatenate(quicksort('less'), 'pivot', quicksort('greater'))
The only difference with the standard quicksort is that you cannot generally know the answer to the question if('x' ≤ 'pivot') ...
. So instead, if x=Luke
and pivot = Tim
, you look in your table for the sentence "Luke has done worse than Tim". If you find it, then you consider that the answer is true
and do append 'x' to 'less'
. If you find in the table "Luke has done better than Tim", then you consider that the answer is false, and you do append 'x' to 'greater'
. And finally, if you can't find any of the two sentences mentionned above, you act as if 'x' == 'pivot'
, and you do append 'x' to 'less'
Upvotes: 3
Reputation: 19601
If you throw a http://en.wikipedia.org/wiki/Topological_sorting algorithm at this, you can look at the result to see if any two people are in a fixed order: if so, one will be reachable from the other. If your data is consistent, a topological sort will give you a linear order which is consistent with your data, but may imply orderngs that your data does not actually support, due to lack of comparisons to tell whether Adam really has done better than Bill.
Unless your data is very special, you will probably find a few cases where there are cycles in the graph and so no order at all is consistent. You could find groups of people linked in such cycles with http://en.wikipedia.org/wiki/Strongly_connected_component.
If you have to resolve noisy and inconsistent data, you will probably have to turn to some sort of statistical ranking system. There is a sort of summary at http://en.wikipedia.org/wiki/Pairwise_comparison. Further web-searching will show you that the Bradley-Terry model can be fitted in a variety of ways including logistic regression, so it is one starting point, although of course there is a huge variety of other scoring systems, some associated with sports and games such as http://en.wikipedia.org/wiki/Elo_rating_system
Upvotes: 2