user618815
user618815

Reputation:

Detecting cycles of a graph(maybe directed or undirected) in Haskell

I started to solve this problem in the imperative way and it works (DFS with traditional three coloring techniques). However, it takes me triple time to figure out how to do it Haskell and I failed! Suppose I represent graph as list (or map) of a node with its adjacency nodes.

type Node = Int
type Graph = [(Node, [Node])]

Note the above representation can be directed or undirected. I also pass the seen set and finished set as arguments(since no side effects are preferred in functional) when doing exploration to detect back track edge. However, I just can't do it in Haskell! I know there may be using State monad, but that thing hasn't come through my mind quite well neither. I am curious to know how could anyone guide me how to do it in "beautiful" Haskell style?

Upvotes: 14

Views: 4051

Answers (3)

Theo Belaire
Theo Belaire

Reputation: 3020

There is the naive way of attempting it, that looks like this:

route :: Graph -> Label -> Label -> Bool
route g dest from | from == dest = True
route g dest from = any (route g dest) (neighbours g from)

But that fails at looping graphs. (I'm also assuming you have neighbours defined)

So, what to do but pass the list of already seen nodes through.

route2 :: Graph  -> Label -> Label -> [Label] -> Bool
route2 g dest from seen 
  | dest == from = True
  | otherwise    = any (\x -> route2 g dest x (from:seen)) (neighbours g from)

But if you were running it on the graph here: Dag You would get a trace that looked something like this (excuse the scheme, I've shamelessly stolen these pictures from my cs class. fr is find-route, and fr-l is a version of it that takes a list. The second parameter is the accumulator) Trace

As you can see, it ends up visiting the nodes K and H twice. This is bad, lets see why it's doing that.

Since it doesn't pass any information back up from the calls recursive in any, it can't see what it did in branches that failed, only what was on the path to the current node.

Now to fix that, there are two paths we can take. My class took a continuation passing approach that is rather novel, so I will show it first, before the state monad version.

routeC :: Graph -> Label -> Label -> [Label] -> ([Label] -> Bool) -> Bool
routeC g  dest from seen k 
  | dest == from     = True
  | from `elem` seen = k (from:seen)
  | otherwise        = routeCl g dest (neighbours g from) (from:seen) k

routeCl :: Graph -> Label -> [Label] -> [Label] -> ([Label] -> Bool) -> Bool
routeCl g dest []     seen k = k seen
routeCl g dest (x:xs) seen k = 
    routeC g dest x seen (\newSeen -> routeCl g dest xs newSeen k)

This uses a pair of functions, instead of any. routeC just checks to see if we're arrived at the destination, or if we've looped, otherwise it just calls routeCL with the neighbours of the current node.

If we have looped, then instead of just returning False, we call the continuation, but with the nodes that we've currently seen (including the current one).

routeCL takes a list of nodes, and if the list is empty, runs the continuation, otherwise it does something interesting. It runs routeC on the first node, and passes it a continuation that will run routeCl on the rest of the list, with the NEW list of seen nodes. So it will be able to see into the history of the failed branches.

(As an additional thing, we can generalize this a bit further, and fully transform it into continuation passing style. I've generalized any as well, instead of using the pair of functions. This is optional, and the type signature is scarier than the code.)

anyK :: (a -> s -> (s -> r) -> (s -> r) -> r) ->
        [a] -> s -> (s -> r) -> (s -> r) -> r
anyK p []     s tK fK = fK s
anyK p (x:xs) s tK fK = p x s tK (\s' -> anyK p xs s' tK fK)

routeK2 :: Graph -> Label -> Label -> ([Label] -> r) -> ([Label] -> r) -> r
routeK2 g dest from' trueK falseK = route from' [] trueK falseK
  where route from seen tK fK 
         | from == dest = tK seen
         | from `elem` seen = fK seen
         | otherwise = anyK route (neighbours g from) (from:seen) tK fK

Same thing, but with more information being passed in.

Now, for what you've been waiting for, the State Monad version.

routeS :: Graph -> Label -> Label -> State [Label] Bool
routeS g dest from | dest == from = return True
routeS g dest from = do
      seen <- get 
      if from `elem` seen then return False else do
      put (from:seen)
      anyM (routeS g dest) (neighbours g from)

But doesn't that last line look a lot like what we started with, just with some extra plumbing? Compare:

any  (route g dest)  (neighbours g from)  -- Simple version
anyM (routeS g dest) (neighbours g from)  -- State Version
anyK route         (neighbours g from) (from:seen) tK fK  -- CPS version

At the core, all three are doing the same thing. The monad in the state version just nicely handles the plumbing of the seen nodes for us. And the CPS version shows us exactly what the flow of control will be like, in a much more explicit fashion than the state monad.

Oh, but anyM doesn't seem to be in the standard library. Here's what it looks like:

anyM :: (Monad m) => (a -> m Bool) -> [a] -> m Bool
anyM p [] = return False
anyM p (x:xs) = do
    y <- p x
    if y
      then return True
      else anyM p xs

Upvotes: 5

dflemstr
dflemstr

Reputation: 26167

First of all, there's a data type for storing Graphs in Haskell; it's called Data.Graph.Graph in the containers package. It uses a Data.Array instead of a list, but is otherwise identical to your representation.

type Graph = Array Int [Int]

This representation leads to much more efficient graphs, while also using much less memory. I use this library like so:

import Data.Graph (Graph)
import qualified Data.Graph as Graph
import Data.Array

You presumably know the minimum and maximum nodes in your graph; if not, this function calculates them for you and creates a Graph:

makeGraph :: [(Node, [Node])] -> Graph
makeGraph list =
  array (minimum nodes, maximum nodes) list
  where
    nodes = map fst list

To see if a node is part of a cycle, one must check whether the nodes reachable from one node, excluding the node itself, contain that node. One can use the reachable function to get the nodes that are reachable from a given node (including that node). Since a Graph is an Array, one can use assocs to get back the list that it was built from, with type [(Node, [Node])]. We use these three facts to build two functions:

-- | Calculates all the nodes that are part of cycles in a graph.
cyclicNodes :: Graph -> [Node]
cyclicNodes graph =
  map fst . filter isCyclicAssoc . assocs $ graph
  where
    isCyclicAssoc = uncurry $ reachableFromAny graph

-- | In the specified graph, can the specified node be reached, starting out
-- from any of the specified vertices?
reachableFromAny :: Graph -> Node -> [Node] -> Bool
reachableFromAny graph node =
  elem node . concatMap (Graph.reachable graph)

If you are interested in how the reachable function works, I could go through all of it here, but it's fairly straight-forward to understand when you look at the code.

These functions are very efficient, but they could be vastly improved depending on how you want cycles to be represented in the end. You can for example use the stronglyConnComp function in Data.Graph to get a more streamlined representation.

Note that I'm abusing the fact that Node ~ Graph.Vertex ~ Int in this case, so if your Nodes change type, you need to use appropriate conversion functions in Data.Graph, like graphFromEdges, to get a Graph and associated conversion functions.

The fgl library is another alternative that also provides a complete suite of graph-related functionality that is extremely optimized.

Upvotes: 13

Daniel Wagner
Daniel Wagner

Reputation: 153162

I'd probably just cabal install fgl and use the built-in DFS functions like components and similar.

Upvotes: 1

Related Questions