Reputation: 2876
I was trying to make a datatype for a graph in Haskell as follows:
type Vertex a = a
type Edge a = (Vertex a, Vertex a)
data Graph a = Combine [Vertex a] [Edge a]
This is a representation that worked for what I wanted to do, but I realized there could be edges for vertices which are not in the vertex-list.
My question is thus whether there is a possibility to assure every edge only contains vertices from the vertex-list?
I thought a lot about it already, but the best idea I got thus far was some function that makes a new graph with all missing vertices from the edge-list added to the vertex-list. Something like:
fix_graph :: Graph a -> Graph a
fix_graph (Combine v e) = Combine (removeDuplicates (v ++ [x | (x,_) <- e] ++ [y | (_,y) <- e])) e
removeDuplicates :: [t] -> [t]
...
Because this idea did not really satisfy me (also because I didn't take the time to implement it well), I wondered whether it would be possible to have a data constructor that adds the vertices from the edges which are not in the vertex-list yet immediately.
I've already read through the answers here, but I'm not really fond of the adjacency-representation used there. I know I am being annoying, but I would just like to get to know whether there aren't any other possibilities to solve this problem.
If anybody could help me with a solution or with getting rid of my illusions, it would be helpful...
Thanks in advance
Upvotes: 4
Views: 519
Reputation: 9807
So there are a couple different options:
There are lots of ways to encode graphs in Haskell. The absolute simplest is to use a process called "tying the knot" to create circularity in a tree data structure. So for example to encode this graph:
.--.
A -- B -- C -- D -./
| | | |
E -- F G -- H
| |
+--------------+
You can simply write a node as its name and list of children:
data Node = Node String [Node]
instance Eq Node where
Node a _ == Node b _ = a == b
my_graph = [a, b, c, d, e, f, g, h] where
a = Node "A" [b, e]
b = Node "B" [a, c, f]
c = Node "C" [b, d, g]
d = Node "D" [c, d, h]
e = Node "E" [a, f, h]
f = Node "F" [b, e]
g = Node "G" [c, h]
h = Node "H" [d, e, g]
This has a lot of convenience: you can now walk through the data structure like any other Haskell data structure, with tail-recursive functions. To terminate on cycles, on recursion you tack the current item onto a path
variable and the first thing that your recursive logic should say is, | node
elempath = ...
to handle the cycles however you want.
The flip side is that your minor consistency issues have blown up a bit into really thorny consistency issues. Consider for example the difference between these two:
-- A has one child, which is itself; B has one child, which is A.
let a = Node "A" [a]; b = Node "B" [a] in [a, b]
-- this looks almost the same but if you descend far enough on B you find an
-- impostor node with the wrong children.
let a = Node "A" [a]
impostor = Node "A" [b]
b = Node "B" [Node "A" [Node "A" [impostor]]]
in [a, b]
So that sucks and the only real answer I have for it is, "normalize by converting to one of the below...".
Anyway, the above trick also goes by the names mutual recursion and letrec, and basically means that within a where
or let
clause, all of the definitions that you put there can "see each other". It is not a property of laziness; you can make the above data structure in a strict language too -- but the language design for a functional strict language which understands mutually-recursive definitions this way might be a little difficult. (With a non-functional language, you just create the pointers as you need.)
Now think about how you'd take such a graph as we've got above, and convert it to your representation. The easiest way would involve going through a middle-man step which contains an Array
:
import From.Above.Code (Node)
import Data.Array
type Graph = Array [Int]
graph :: [Node] -> Maybe Graph
graph nodes = fmap (array (1, length nodes)) . sequence $ map format nodes where
indices = zip nodes [1..]
pair x y = (x, y)
format node@(Node _ children) = do -- in the Maybe monad
index <- lookup node indices
index_list <- sequence $ map (flip lookup indices) children
return (index, index_list)
Now, this has a lot fewer consistency issues, which can now all be alleviated programmatically. However, those consistency issues can serve a purpose if you want to programmatically create such a graph with the State monad, and you want to temporarily leave the data structure in an inconsistent state until the proper node is read. The only disadvantage is that when you write the graph into your file, it looks a bit harder to understand because numbers are less friendly than strings:
array (1, 8) [
(1, [2, 5]),
(2, [1, 3, 6]),
(3, [2, 4, 7]),
(4, [3, 4, 8]),
(5, [1, 6, 8]),
(6, [2, 5]),
(7, [3, 8]),
(8, [4, 5, 7])]
You can solve this with, say, a Map String [String]
for the tradeoff that accesses become O(log n)
. In any case you should learn that representation: you will want to convert to an IntMap [Int]
and back when you to do your "completeness checks" that you're proposing.
Once you've got these, it turns out that you can use a backing Array Int Node
to create a recursive [Node]
as above:
nodesFromArray arr = nodes where
makeNode index children = Node (show index) [backingArray ! c | c <- children]
backingArray = array (bounds arr) [(i, makeNode i c) | (i, c) <- assocs arr]
nodes = map makeNode arr
Once you've got the above lists (either Map.toList or Array.assocs), lists of edges become very easy to make:
edges_from_array = concatMap . uncurry (fmap . pair) . assocs
The flip-side is a little more complicated and accomplishes what you're trying to do directly:
import Data.Map (Map)
import Data.Set (Set)
import qualified Data.Map as Map
import qualified Data.Set as Set
makeGraphMap vertices edges = add edges (Map.fromList $ blankGraph vertices) where
blankGraph verts = zip verts (repeat Set.empty)
setInsert x Nothing = Just $ Set.singleton x
setInsert x (Just set) = Just $ Set.insert x set
add [] graphMap = fmap Set.toList graphMap
add ((a, b) : es) graphMap = Map.alter (setInsert b) a verts
That is, we walk the list of edges with a map which maps keys to their sets of children; we initialize this to the list of vertices mapping to empty sets (so that we can have disconnected single nodes in our graphs), then walk through the edges by inserting the value to the set at the key, creating that set if we don't see the key.
Upvotes: 2