Reputation: 4402

Optimizing BFS implemented in Haskell

So I wanted to write a graph breadth-first search. The algorithm keeps track of some values in it's state. Those are: a visited state for each node and a queue. It also needs to know the edges of the graph and what it's target destination is, but that doesn't change in steps.

This is what I came up with (sorry for the uglyness)

import Prelude hiding (take, cons, drop)
import Data.Vector

type BFSState = (Vector Bool, Vector [Int], [Int], Int)
bfsStep :: BFSState -> BFSState
bfsStep (nodes, edges, current:queue, target)
    | current == target = (nodes, edges, [], target)
    | nodes ! current = (nodes, edges, queue, target)
    | otherwise = (markedVisited, edges, queue Prelude.++ (edges ! current), target)
    where
        markedVisited = (take current nodes) Data.Vector.++ (cons True (drop (current + 1) nodes))

bfsSteps :: BFSState -> [BFSState]
bfsSteps init = steps
    where steps = init : Prelude.map bfsStep steps

The bfsStep takes a state and produces the next one. When the state's queue is [], the target node has been found. bfsSteps just uses a self-referring list to make a list of BFSStates. Now, currently there's no way to find out how many steps it takes to get to a certain node (given the starting conditions) but the bfsSteps function will produce the steps that the algorithm took.

What I'm concerned about is that the state gets copied every step. I realize that concatenation with ++ doesn't perform well but I feel that it honestly doesn't matter since ALL of the state gets copied every step.

I know there are monads that should pretty much do what I'm doing here, but since Haskell is pure, doesn't it means that monads still have to copy the state?

Shouldn't there be a way to say "Hey, I'm using these values only once in my code and I'm not storing them anywhere. You can just change them instead of making new ones"?

If Haskell did that by itself it would still allow me to keep the code pure, but make the execution fast.

Upvotes: 1

Answers (3)

cdk

Reputation: 6778

Since the Edges and target never change, I rewrote bfsStep to only return the new Nodes and queue. Also I used Data.Vector.modify to do an in-place update of Nodes, instead of the awkward take/drop/cons method that was used previously.

Also, bfsStep can be written more succinctly as iterate from the Prelude.

Now, everything in bfs is O(1) except for the O(n) append on the queue. However, (++) is only O(n) in the length of its first argument, so if the number of edges per vertex is small it will be quite efficient.

import Data.Vector (Vector)                                      
import qualified Data.Vector         as V
import qualified Data.Vector.Mutable as M                    

type Nodes = Vector Bool            
type Edges = Vector [Int]

bfs :: Nodes -> Edges -> [Int] -> Int -> (Nodes, [Int])
bfs nodes edges (x:xs) target              
    | x == target = (nodes, [])         
    | nodes V.! x = (nodes, xs)         
    | otherwise   = (marked, edges V.! x ++ xs)
    where marked = V.modify (\v -> M.write v x True) nodes 

bfsSteps :: Nodes -> Edges -> [Int] -> Int -> [(Nodes, [Int])]
bfsSteps nodes edges queue target = 
    iterate (\(n, q) -> bfs n edges q target) (nodes, queue)

Upvotes: 2

lpsmith

Reputation: 707

You may be interested in reading the first section or two of my Monad Reader article: Lloyd Allison’s Corecursive Queues: Why Continuations Matter, which uses self reference to implement an efficient queue. There's also code available on hackage as control-monad-queue. In fact I first discovered this trick when implementing a reasonably efficient breadth-first graph reachability algorithm, although I used functional data structures for tracking what the algorithm has already seen.

If you really want to stick with imperative data structures for tracking where you've been, I do recommend the ST monad. Unfortunately getting ST to work with the type of queue I mentioned above is a bit hacky; I'm not sure I can recommend that combination, although from an FP mindset there's nothing too wrong with that combination.

With a more imperative approach, you are probably best off with the traditional two stack queue, or if you really want some extra performance, implementing an imperative queue based on chunks of imperative arrays.

Upvotes: 3

rampion

Reputation: 89073

Your state is only copied when it's modified - not when it's used.

For example, edges :: Vector [Int] is never modified by bfsStep, so the same value is reused throughout all the recursive calls.

On the other hand, your queue :: [Int] is modified by bfsStep in two ways:

splitting it into current : queue - but this reuses the tail of the original queue, so no copying is done
appending to it with Prelude.++. This requires O(queue size) copying.

You have similarly copying required when you update your nodes :: Vector Int to include a new node.

There's a couple ways you could do less copying of your queue and a couple ways to do less copying of your nodes.

For nodes you could wrap your computation in the ST s monad to use a single modifiable vector. Alternately you could use a functional data structure like an IntMap which has fairly fast update.

For your queue you could use Data.Sequence, or a two list implementation.

Upvotes: 3

Optimizing BFS implemented in Haskell

Answers (3)

Related Questions