Reputation: 135
The goal is to sort a list X of n unknown variables {x0, x1, x2, ... x(n-1)} using a list C of m comparison results (booleans). Each comparison is between two of the n variables, e.g. x2 < x5, and the pair indices for each of the comparisons are fixed and given ahead of time. Also given: All pairs in C are unique (even when flipped, e.g. the pair x0, x1 means there is no pair x1, x0), and never compare a variable against itself. That means C has at most n*(n-1)/2 entries.
So the question is can I prove that my list C of m comparisons is sufficient to sort the list X? Obviously it would be if C was the largest possible length (had all possible comparisons). But what about shorter lists?
Then, if it has been proven that C contains enough information to sort, how do I then actually go about performing the sort.
Upvotes: 4
Views: 660
Reputation: 372714
Let's imagine that you have the collection of objects to be sorted and form a graph from them with one node per object. You're then given a list of pairs indicating how the comparisons go. You can think of these as edges in the graph: if you know that object x compares less than object y, then you can draw an edge from x to y.
Assuming that the results of the comparisons are consistent - that is, you don't have any cycles - you should have a directed acyclic graph.
Think about what happens if you topologically sort this DAG. What you'll end up with is one possible ordering of the values that's consistent with all of the constraints. The reason for this is that in a topological ordering, you won't place an element x before an element y if there is any transitive series of edges leading from y to x, and there's a transitive series of edges leading from y to x if there's a chain of comparisons that transitively indicates that y precedes x.
You can actually make a stronger claim: the set of all topological orderings of the DAG is exactly the set of all possible orderings that satisfy all the constraints. We've already argued that every topological ordering satisfies all the constraints, so all we need to do now is argue that every sequence satisfying all the constraints is a valid topological ordering. The argument here is essentially that if you obey all the constraints, you never place any element in the sequence before something that it transitively compares less than, so you never place any element in the sequence before something that has a path to it.
This then gives us a nice way to solve the problem: take the graph formed this way and see if it has exactly one topological ordering. If so, then that ordering is the unique sorted order. If not, then there are two or more orderings.
So how best to go about this? Well, one of the standard algorithms for doing a topological sort is to annotate each node with its indegree, then repeatedly pull off a node of indegree zero and adjust the indegrees of its successors. The DAG has exactly one topological ordering if in the course of performing this algorithm, at every stage there is exactly one node of indegree zero, since in that case the topological ordering is forced.
With the right setup and data structures, you can implement this to run in time O(n + m), where n is the number of nodes and m is the number of constraints. I'll leave those details as a proverbial exercise to the reader. :-)
Upvotes: 4
Reputation: 10882
Your problem can be reduced to the well-known Topological sort.
To prove that "C contains enough information to sort" is to prove the uniqueness of topological sort:
If a topological sort has the property that all pairs of consecutive vertices in the sorted order are connected by edges, then these edges form a directed Hamiltonian path in the DAG. If a Hamiltonian path exists, the topological sort order is unique; no other order respects the edges of the path. Conversely, if a topological sort does not form a Hamiltonian path, the DAG will have two or more valid topological orderings, for in this case it is always possible to form a second valid ordering by swapping two consecutive vertices that are not connected by an edge to each other. Therefore, it is possible to test in linear time whether a unique ordering exists, and whether a Hamiltonian path exists, despite the NP-hardness of the Hamiltonian path problem for more general directed graphs (Vernet & Markenzon 1997).
Upvotes: 2