Reputation: 8641
I'm looking for a nice implementation of topological sorting in scala.
The solution should be stable:
If input is already sorted, the output should be unchanged
The algorithm should be deterministic (hashCode has no effect)
I suspect there are libraries that can do this, but I wouldn't like to add nontrivial dependencies due to this.
Example problem:
case class Node(name: String)(val referenced: Node*)
val a = Node("a")()
val b = Node("b")(a)
val c = Node("c")(a)
val d = Node("d")(b, c)
val e = Node("e")(d)
val f = Node("f")()
assertEquals("Previous order is kept",
Vector(f, a, b, c, d, e),
topoSort(Vector(f, a, b, c, d, e)))
assertEquals(Vector(a, b, c, d, f, e),
topoSort(Vector(d, c, b, f, a, e)))
Here the order is defined such that if the nodes were say declarations in a programming language referencing other declarations, the result order would be such that no declaration is used before it has been declared.
Upvotes: 2
Views: 2032
Reputation: 8641
Here is my own solution. Additionnally it returns possible loops detected in the input.
The format of the nodes is not fixed because the caller provides a visitor that will take a node and a callback and call the callback for each referenced node.
If the loop reporting is not necessary, it should be easy to remove.
import scala.collection.mutable
// Based on https://en.wikipedia.org/wiki/Topological_sorting?oldformat=true#Depth-first_search
object TopologicalSort {
case class Result[T](result: IndexedSeq[T], loops: IndexedSeq[IndexedSeq[T]])
type Visit[T] = (T) => Unit
// A visitor is a function that takes a node and a callback.
// The visitor calls the callback for each node referenced by the given node.
type Visitor[T] = (T, Visit[T]) => Unit
def topoSort[T <: AnyRef](input: Iterable[T], visitor: Visitor[T]): Result[T] = {
// Buffer, because it is operated in a stack like fashion
val temporarilyMarked = mutable.Buffer[T]()
val permanentlyMarked = mutable.HashSet[T]()
val loopsBuilder = IndexedSeq.newBuilder[IndexedSeq[T]]
val resultBuilder = IndexedSeq.newBuilder[T]
def visit(node: T): Unit = {
if (temporarilyMarked.contains(node)) {
val loopStartIndex = temporarilyMarked.indexOf(node)
val loop = temporarilyMarked.slice(loopStartIndex, temporarilyMarked.size)
.toIndexedSeq
loopsBuilder += loop
} else if (!permanentlyMarked.contains(node)) {
temporarilyMarked += node
visitor(node, visit)
permanentlyMarked += node
temporarilyMarked.remove(temporarilyMarked.size - 1, 1)
resultBuilder += node
}
}
for (i <- input) {
if (!permanentlyMarked.contains(i)) {
visit(i)
}
}
Result(resultBuilder.result(), loopsBuilder.result())
}
}
In the example of the question this would be applied like this:
import TopologicalSort._
def visitor(node: BaseNode, callback: (Node) => Unit): Unit = {
node.referenced.foreach(callback)
}
assertEquals("Previous order is kept",
Vector(f, a, b, c, d, e),
topoSort(Vector(f, a, b, c, d, e), visitor).result)
assertEquals(Vector(a, b, c, d, f, e),
topoSort(Vector(d, c, b, f, a, e), visitor).result)
The worst case complexity of this solution is actually above O(n + m) because the temporarilyMarked
array is scanned for each node.
The asymptotic complexity would be improved if the temporarilyMarked
would be replaced with for example a HashSet
.
A true O(n + m) would be achieved if the marks were be stored directly inside the nodes, but storing them outside makes writing a generic solution easier.
I haven't run any performance tests, but I suspect scanning the temporarilyMarked
array is not a problem even in large graphs as long as they are not very deep.
I have very similar code is also published here. That version has a test suite which can be useful for experimenting and exploring the implementation.
Detecting loops can be useful for example in serialization situations where most of the data can be handled as a DAG, but loops can be handled with some kind of special arrangement.
The test suite in the Github code linked to in above section contains various cases with multiple loops.
Upvotes: 2
Reputation: 91
Here's a purely functional implementation that returns the topological ordering ONLY if the graph is acyclic.
case class Node(label: Int)
case class Graph(adj: Map[Node, Set[Node]]) {
case class DfsState(discovered: Set[Node] = Set(), activeNodes: Set[Node] = Set(), tsOrder: List[Node] = List(),
isCylic: Boolean = false)
def dfs: (List[Node], Boolean) = {
def dfsVisit(currState: DfsState, src: Node): DfsState = {
val newState = currState.copy(discovered = currState.discovered + src, activeNodes = currState.activeNodes + src,
isCylic = currState.isCylic || adj(src).exists(currState.activeNodes))
val finalState = adj(src).filterNot(newState.discovered).foldLeft(newState)(dfsVisit(_, _))
finalState.copy(tsOrder = src :: finalState.tsOrder, activeNodes = finalState.activeNodes - src)
}
val stateAfterSearch = adj.keys.foldLeft(DfsState()) {(state, n) => if (state.discovered(n)) state else dfsVisit(state, n)}
(stateAfterSearch.tsOrder, stateAfterSearch.isCylic)
}
def topologicalSort: Option[List[Node]] = dfs match {
case (topologicalOrder, false) => Some(topologicalOrder)
case _ => None
}
}
Upvotes: 1