How to ensure correct edges in graph

Question

I was trying to make a datatype for a graph in Haskell as follows:

type Vertex a = a
type Edge a = (Vertex a, Vertex a)
data Graph a = Combine [Vertex a] [Edge a]

This is a representation that worked for what I wanted to do, but I realized there could be edges for vertices which are not in the vertex-list.

My question is thus whether there is a possibility to assure every edge only contains vertices from the vertex-list?

I thought a lot about it already, but the best idea I got thus far was some function that makes a new graph with all missing vertices from the edge-list added to the vertex-list. Something like:

fix_graph :: Graph a -> Graph a
fix_graph (Combine v e) = Combine (removeDuplicates (v ++ [x | (x,_) <- e] ++ [y | (_,y) <- e])) e

removeDuplicates :: [t] -> [t]
...

Because this idea did not really satisfy me (also because I didn't take the time to implement it well), I wondered whether it would be possible to have a data constructor that adds the vertices from the edges which are not in the vertex-list yet immediately.

I've already read through the answers here, but I'm not really fond of the adjacency-representation used there. I know I am being annoying, but I would just like to get to know whether there aren't any other possibilities to solve this problem.

If anybody could help me with a solution or with getting rid of my illusions, it would be helpful...

Thanks in advance

CR Drost · Accepted Answer

So there are a couple different options:

Tying the knot

There are lots of ways to encode graphs in Haskell. The absolute simplest is to use a process called "tying the knot" to create circularity in a tree data structure. So for example to encode this graph:

                  .--.
  A -- B -- C -- D -./
  |    |    |    |
  E -- F    G -- H
  |              |
  +--------------+

You can simply write a node as its name and list of children:

data Node = Node String [Node]
instance Eq Node where
    Node a _ == Node b _ = a == b

my_graph = [a, b, c, d, e, f, g, h] where
    a = Node "A" [b, e]
    b = Node "B" [a, c, f]
    c = Node "C" [b, d, g]
    d = Node "D" [c, d, h]
    e = Node "E" [a, f, h]
    f = Node "F" [b, e]
    g = Node "G" [c, h]
    h = Node "H" [d, e, g]

This has a lot of convenience: you can now walk through the data structure like any other Haskell data structure, with tail-recursive functions. To terminate on cycles, on recursion you tack the current item onto a path variable and the first thing that your recursive logic should say is, | nodeelempath = ... to handle the cycles however you want.

The flip side is that your minor consistency issues have blown up a bit into really thorny consistency issues. Consider for example the difference between these two:

-- A has one child, which is itself; B has one child, which is A.
let a = Node "A" [a]; b = Node "B" [a] in [a, b]

-- this looks almost the same but if you descend far enough on B you find an
-- impostor node with the wrong children.
let a = Node "A" [a]
    impostor = Node "A" [b]
    b = Node "B" [Node "A" [Node "A" [impostor]]] 
in [a, b]

So that sucks and the only real answer I have for it is, "normalize by converting to one of the below...".

Anyway, the above trick also goes by the names mutual recursion and letrec, and basically means that within a where or let clause, all of the definitions that you put there can "see each other". It is not a property of laziness; you can make the above data structure in a strict language too -- but the language design for a functional strict language which understands mutually-recursive definitions this way might be a little difficult. (With a non-functional language, you just create the pointers as you need.)

Explicit numerology

Now think about how you'd take such a graph as we've got above, and convert it to your representation. The easiest way would involve going through a middle-man step which contains an Array:

import From.Above.Code (Node)
import Data.Array

type Graph = Array [Int]

graph :: [Node] -> Maybe Graph
graph nodes = fmap (array (1, length nodes)) . sequence $ map format nodes where
    indices = zip nodes [1..]
    pair x y = (x, y)
    format node@(Node _ children) = do -- in the Maybe monad
        index <- lookup node indices
        index_list <- sequence $ map (flip lookup indices) children
        return (index, index_list)

Now, this has a lot fewer consistency issues, which can now all be alleviated programmatically. However, those consistency issues can serve a purpose if you want to programmatically create such a graph with the State monad, and you want to temporarily leave the data structure in an inconsistent state until the proper node is read. The only disadvantage is that when you write the graph into your file, it looks a bit harder to understand because numbers are less friendly than strings:

 array (1, 8) [
     (1, [2, 5]),
     (2, [1, 3, 6]),
     (3, [2, 4, 7]),
     (4, [3, 4, 8]),
     (5, [1, 6, 8]),
     (6, [2, 5]),
     (7, [3, 8]),
     (8, [4, 5, 7])]

You can solve this with, say, a Map String [String] for the tradeoff that accesses become O(log n). In any case you should learn that representation: you will want to convert to an IntMap [Int] and back when you to do your "completeness checks" that you're proposing.

Once you've got these, it turns out that you can use a backing Array Int Node to create a recursive [Node] as above:

nodesFromArray arr = nodes where
    makeNode index children = Node (show index) [backingArray ! c | c <- children]
    backingArray = array (bounds arr) [(i, makeNode i c) | (i, c) <- assocs arr]
    nodes = map makeNode arr

Lists of edges

Once you've got the above lists (either Map.toList or Array.assocs), lists of edges become very easy to make:

edges_from_array = concatMap . uncurry (fmap . pair) . assocs

The flip-side is a little more complicated and accomplishes what you're trying to do directly:

import Data.Map (Map)
import Data.Set (Set)
import qualified Data.Map as Map
import qualified Data.Set as Set

makeGraphMap vertices edges = add edges (Map.fromList $ blankGraph vertices) where
     blankGraph verts = zip verts (repeat Set.empty)
     setInsert x Nothing = Just $ Set.singleton x
     setInsert x (Just set) = Just $ Set.insert x set
     add [] graphMap = fmap Set.toList graphMap
     add ((a, b) : es) graphMap = Map.alter (setInsert b) a verts

That is, we walk the list of edges with a map which maps keys to their sets of children; we initialize this to the list of vertices mapping to empty sets (so that we can have disconnected single nodes in our graphs), then walk through the edges by inserting the value to the set at the key, creating that set if we don't see the key.

How to ensure correct edges in graph

Answers (1)

Tying the knot

Explicit numerology

Lists of edges

Related Questions