Reputation: 97
I have Network data structure like this:
data Network = Empty | Node Double [Network] [Network]
Where the Double represents the node label, the first list is the parents, and the 2nd list is the children of a given node. Assume all edges are directed from root to leaves, and there are no cycles.
I have an edge data structure like this:
data Edge = Edge (Network, Network)
where the edge is directed from a->b for Edge (a,b)
I have written a function meant to make a list of all the tree edges of the network. To further define this point, every and all parent (incoming) edges to a node that has multiple parents (a reticulation node) are not tree-edges. I am considering the outgoing edge from a reticulation node, and all other edges in the network, to be a tree edge.
I attempted to use list comprehension for this task like such:
getTreeEdges :: Network -> [Edge]
getTreeEdges (Node label ps cs) = nub (getTreeEdgesHelp (Node label (Empty:[]) cs))
getTreeEdgesHelp :: Network -> [Edge]
--leaf node case, children is empty list
getTreeEdgesHelp n@(Node _ ps []) = [Edge (p, n) | p <- ps ]
--reticulation node and root case, there is more than one parent or parent is empty(root)
getTreeEdgesHelp (Node _ ps cs) | ((length ps) > 1||ps==(Empty:[])) = (concat (map getTreeEdgesHelp cs))
--interal node case
getTreeEdgesHelp n@(Node _ ps cs) = ([Edge (p, n) | p <- ps ])++( concat (map getTreeEdgesHelp cs))
As you can see, the list comprehension is used to store the parent edges for leaves and for internal edges, while the root and reticulation nodes skip over adding edges to the list and just recurse. Leaves end the recursion.
The problem I have is that the edges stored do not contain all of the expected information. More specifically the parent information of the parent nodes disappears. The other information remains, the label and the children are unaffected.
So for example, say I store the edge (a,b)
. The network b
is stored exactly correctly. The network a
has itself (the root node) and all the descendant nodes present and labelled properly, but they all lack any entry for their parents, the parents to every node is []
.
Here is an example of a given input, output, and expected output with newick format (a,b)c for a node c that has children a and b. I also include the parent(s) of a given node in square brackets after the label. For full disclosure, here is how I define Show
for the Network
and the Edge
.
instance Show Network where
show Empty = "_"
show (Node a ps []) = (show a)++"["++printlabels ps++"]"
show (Node a ps [x,y]) = "("++show x++","++show y ++ ")"++(show a)++"["++printlabels ps++"]"
show (Node a ps [x]) = "("++show x++")"++(show a)++"["++printlabels ps++"]"
printlabels :: [Network] -> [Char]
printlabels [] = " "
printlabels (Empty:_) = "root"
printlabels ((Node label _ _):ns) = (show label)++","++(printlabels ns)
Here is an example tree I have generated. Because the labels are rather long, I have substituted out letters to label the internal nodes and integers to label the leaves.
(((4.0[c],((0.0[f])f[e,h],(1.0[g],2.0[g])g[e])e[c])c[h],(0.0[f])f[e,h])h[d],3.0[d])d[root]
Is the representation of the following graph.
Running the function described above gives:
[(d,h),(h,c),(c,4.0),(c,e),(f,0.0),(e,g),(g,1.0),(g,2.0),(d,3.0)]
with
instance Show Edge where
show ( Edge ((Node labelu _ _), (Node labelv _ _)) ) = "("++(show labelu)++","++(show labelv)++")"
When I query a single edge, say the one at position 5 (e,g)
and rewrite Show
for Edge
to
instance Show Edge where
show ( Edge (u, v) ) = "("++(show u)++"=1, 2="++(show v)++")"
I get (((1.0[ ],2.0[ ])g[ ],0.0[ ])e[ ]=1, 2=(1.0[g],2.0[g])g[e])
whereas the expected result would be (((1.0[g],2.0[g])g[e],(0.0[f])f[e,h])e[c]=1, 2=(1.0[g],2.0[g])g[e])
Another error I have noticed is that 0.0
is given as the sibling to g
which is not true. As a commenter pointed out, I am not changing any nodes when trying to get tree edges, I am just assigning the nodes to an appropriate edge pair and storing that in a list. You can also see the network I have input has correct parentage and structure.
What could be going on here? I am new to Haskell, am I using list comprehension wrong?
Upvotes: 1
Views: 133
Reputation: 60513
Your data structure is very unusual for a pure language and will prove extremely hard to work with. Although the graph you are representing is acyclic, the data structure you are using to represent it is cyclic: the child of a parent is the node again. Which means that when you modify something, you need to modify it cyclically: if I change a node, I must also change the corresponding child's parent, and the parents of each of its children, etc. This is very complicated and subtle to do.
This challenge is unique to pure languages: changing cyclic data structures is not a problem with mutable data structures -- you can just change it in one place and everywhere it is referenced can "see" the change.
Best would be to find an acyclic representation of your graph. For example: a graph is a list of edges. If you need it you could also include a Set of nodes (thus degree 0 nodes can be represented). Refer to nodes only by their labels, looking them up in the list whenever you need information about them.
An alternative approach is an inductive graph (implemented in the fgl library).
Upvotes: 2