Reputation: 4402
So I wanted to write a graph breadth-first search. The algorithm keeps track of some values in it's state. Those are: a visited
state for each node and a queue.
It also needs to know the edges of the graph and what it's target destination is, but that doesn't change in steps.
This is what I came up with (sorry for the uglyness)
import Prelude hiding (take, cons, drop)
import Data.Vector
type BFSState = (Vector Bool, Vector [Int], [Int], Int)
bfsStep :: BFSState -> BFSState
bfsStep (nodes, edges, current:queue, target)
| current == target = (nodes, edges, [], target)
| nodes ! current = (nodes, edges, queue, target)
| otherwise = (markedVisited, edges, queue Prelude.++ (edges ! current), target)
where
markedVisited = (take current nodes) Data.Vector.++ (cons True (drop (current + 1) nodes))
bfsSteps :: BFSState -> [BFSState]
bfsSteps init = steps
where steps = init : Prelude.map bfsStep steps
The bfsStep
takes a state and produces the next one. When the state's queue is [], the target node has been found. bfsSteps
just uses a self-referring list to make a list of BFSState
s. Now, currently there's no way to find out how many steps it takes to get to a certain node (given the starting conditions) but the bfsSteps
function will produce the steps that the algorithm took.
What I'm concerned about is that the state gets copied every step. I realize that concatenation with ++ doesn't perform well but I feel that it honestly doesn't matter since ALL of the state gets copied every step.
I know there are monads that should pretty much do what I'm doing here, but since Haskell is pure, doesn't it means that monads still have to copy the state?
Shouldn't there be a way to say "Hey, I'm using these values only once in my code and I'm not storing them anywhere. You can just change them instead of making new ones"?
If Haskell did that by itself it would still allow me to keep the code pure, but make the execution fast.
Upvotes: 1
Views: 723
Reputation: 6778
Since the Edges
and target
never change, I rewrote bfsStep
to only return the new Nodes
and queue
. Also I used Data.Vector.modify
to do an in-place update of Nodes
, instead of the awkward take/drop/cons
method that was used previously.
Also, bfsStep
can be written more succinctly as iterate
from the Prelude
.
Now, everything in bfs
is O(1)
except for the O(n)
append on the queue
. However, (++)
is only O(n)
in the length of its first argument, so if the number of edges per vertex is small it will be quite efficient.
import Data.Vector (Vector)
import qualified Data.Vector as V
import qualified Data.Vector.Mutable as M
type Nodes = Vector Bool
type Edges = Vector [Int]
bfs :: Nodes -> Edges -> [Int] -> Int -> (Nodes, [Int])
bfs nodes edges (x:xs) target
| x == target = (nodes, [])
| nodes V.! x = (nodes, xs)
| otherwise = (marked, edges V.! x ++ xs)
where marked = V.modify (\v -> M.write v x True) nodes
bfsSteps :: Nodes -> Edges -> [Int] -> Int -> [(Nodes, [Int])]
bfsSteps nodes edges queue target =
iterate (\(n, q) -> bfs n edges q target) (nodes, queue)
Upvotes: 2
Reputation: 707
You may be interested in reading the first section or two of my Monad Reader article: Lloyd Allison’s Corecursive Queues: Why Continuations Matter, which uses self reference to implement an efficient queue. There's also code available on hackage as control-monad-queue. In fact I first discovered this trick when implementing a reasonably efficient breadth-first graph reachability algorithm, although I used functional data structures for tracking what the algorithm has already seen.
If you really want to stick with imperative data structures for tracking where you've been, I do recommend the ST monad. Unfortunately getting ST to work with the type of queue I mentioned above is a bit hacky; I'm not sure I can recommend that combination, although from an FP mindset there's nothing too wrong with that combination.
With a more imperative approach, you are probably best off with the traditional two stack queue, or if you really want some extra performance, implementing an imperative queue based on chunks of imperative arrays.
Upvotes: 3
Reputation: 89073
Your state is only copied when it's modified - not when it's used.
For example, edges :: Vector [Int]
is never modified by bfsStep
, so the same value is reused throughout all the recursive calls.
On the other hand, your queue :: [Int]
is modified by bfsStep in two ways:
current : queue
- but this reuses the tail of the original queue, so no copying is donePrelude.++
. This requires O(queue size)
copying.You have similarly copying required when you update your nodes :: Vector Int
to include a new node.
There's a couple ways you could do less copying of your queue
and a couple ways to do less copying of your nodes
.
For nodes
you could wrap your computation in the ST s
monad to use a single modifiable vector. Alternately you could use a functional data structure like an IntMap
which has fairly fast update.
For your queue
you could use Data.Sequence, or a two list implementation.
Upvotes: 3