Reputation: 3122
I'm trying to implement a tree-processing algorithm in Haskell, and (due to the fact this is my first Haskell program!), am struggling with the design of the data structures. Can any of the FP gurus out there lend a hand?
I'll start by describing the important characteristics of the algorithm, sketch out how I would approach this using an imperative language, and finish with the stumbling baby-steps I have made so far in Haskell.
I won't describe the full algorithm in detail, but the salient points are as follows:
The data structures must therefore have the following characteristics:
If I was to implement this algorithm using an imperative language, the solution would look something like the following.
Let's assume that the starting point is the following definition of the input tree:
struct node {
// Identifier for this node, unique within the containing tree
size_t id;
// Label of this node
enum label label;
// Attributes of this node
// An attribute can be assumed to be a key-value pair
// Details of the attributes themselves aren't material to this
// discussion, so the "attribute" type is left opaque
struct attribute **attributes;
size_t n_attributes;
// Pointer to parent of this node
// NULL iff this node is root
struct node *parent;
// Pointer to first child of this node
// NULL iff this node is leaf
struct node *child;
// Double-linked list of siblings of this node
struct node *prev;
struct node *next;
};
The pointers embedded in each node clearly support the up / down / left / right traversals required by the algorithm.
Annotation can be implemented by defining the following structure:
struct algo_node {
// Pointer to input node which has been wrapped
struct node *node;
// Derived properties computed by first phase of the algorithm
// Details of the properties themselves aren't material to this
// discussion, so the "derived" type is left opaque
struct derived props;
// Pointer to corresponding node in the other tree
// NULL iff this node is unmatched
struct node *match;
};
The first phase of the algorithm constructs an algo_node
for each node
in each of the input trees.
Mapping from an algo_node
to a node
is trivial: follow the embedded *node
pointer. Mapping in the other direction can be supported by storing algo_node
s in an array, indexed by the id
of the input node.
This is of course just one possible implementation. Many variations are possible, including
list
or queue
interface, rather than storing three raw pointersstruct algo_node
Let's start with the following definition of the input tree:
data Tree = Leaf Label Attributes
| Node Label Attributes [Tree]
Augmentation of each node with an id can be achieved as follows:
data AnnotatedTree = Tree Int
addIndex :: Int -> AnnotatedTree -> (AnnotatedTree, Int)
indexedTree = addIndex 0 tree
Similarly, we can write a function which computes derived properties:
data AnnotatedTree = Tree DerivedProperties
computeDerived :: DerivedProperties -> AnnotatedTree -> (AnnotatedTree, DerivedProperties)
derivedTree = computeDerived DefaultDerived tree
The above snippets can be adjusted with little work, such that AnnotatedTree
contains both the index and the derived properties.
However, I don't know where to begin with representing mappings between the two trees. Based on some reading, I have some half-baked ideas ...
AnnotatedTree
to contain a path from the root of the other tree to the mapped node - encoded as a list of indices into each successive children list, [Integer]
AnnotatedTree
to contain a reference to the mapped node directly, as a Maybe Tree
... but I could really do with some guidance on which (if any) of these are worth pursuing.
Any help would be much appreciated!
Upvotes: 18
Views: 581
Reputation: 12749
You can label the tree nodes with Int
id's, and walk around them with zippers (using Data.Tree
and Data.Tree.Zipper
is a good idea because there's no need to reinvent the wheel). You can then attach auxiliary attributes to the nodes using a Data.IntMap
to map node id's to whatever you want. In particular, you can create an IntMap
to map from a node id to the TreePos Full Int
for that node so you can explore the parent, siblings, and children of that node.
Upvotes: 1