Gareth Stockwell
Gareth Stockwell

Reputation: 3122

How to represent mapping between two trees in Haskell?

I'm trying to implement a tree-processing algorithm in Haskell, and (due to the fact this is my first Haskell program!), am struggling with the design of the data structures. Can any of the FP gurus out there lend a hand?

I'll start by describing the important characteristics of the algorithm, sketch out how I would approach this using an imperative language, and finish with the stumbling baby-steps I have made so far in Haskell.

The problem

I won't describe the full algorithm in detail, but the salient points are as follows:

The data structures must therefore have the following characteristics:

Sketch of an imperative solution

If I was to implement this algorithm using an imperative language, the solution would look something like the following.

Let's assume that the starting point is the following definition of the input tree:

struct node {
    // Identifier for this node, unique within the containing tree
    size_t id;

    // Label of this node
    enum label label;

    // Attributes of this node
    // An attribute can be assumed to be a key-value pair
    // Details of the attributes themselves aren't material to this
    // discussion, so the "attribute" type is left opaque
    struct attribute **attributes;
    size_t n_attributes;

    // Pointer to parent of this node
    // NULL iff this node is root
    struct node *parent;

    // Pointer to first child of this node
    // NULL iff this node is leaf
    struct node *child;

    // Double-linked list of siblings of this node
    struct node *prev;
    struct node *next;
};

The pointers embedded in each node clearly support the up / down / left / right traversals required by the algorithm.

Annotation can be implemented by defining the following structure:

struct algo_node {
    // Pointer to input node which has been wrapped
    struct node *node;

    // Derived properties computed by first phase of the algorithm
    // Details of the properties themselves aren't material to this
    // discussion, so the "derived" type is left opaque
    struct derived props;

    // Pointer to corresponding node in the other tree
    // NULL iff this node is unmatched
    struct node *match;
};

The first phase of the algorithm constructs an algo_node for each node in each of the input trees.

Mapping from an algo_node to a node is trivial: follow the embedded *node pointer. Mapping in the other direction can be supported by storing algo_nodes in an array, indexed by the id of the input node.

This is of course just one possible implementation. Many variations are possible, including

Moving to Haskell

Let's start with the following definition of the input tree:

data Tree = Leaf Label Attributes
          | Node Label Attributes [Tree]

Augmentation of each node with an id can be achieved as follows:

data AnnotatedTree = Tree Int

addIndex :: Int -> AnnotatedTree -> (AnnotatedTree, Int)

indexedTree = addIndex 0 tree

Similarly, we can write a function which computes derived properties:

data AnnotatedTree = Tree DerivedProperties

computeDerived :: DerivedProperties -> AnnotatedTree -> (AnnotatedTree, DerivedProperties)

derivedTree = computeDerived DefaultDerived tree

The above snippets can be adjusted with little work, such that AnnotatedTree contains both the index and the derived properties.

However, I don't know where to begin with representing mappings between the two trees. Based on some reading, I have some half-baked ideas ...

... but I could really do with some guidance on which (if any) of these are worth pursuing.

Any help would be much appreciated!

Upvotes: 18

Views: 581

Answers (1)

pat
pat

Reputation: 12749

You can label the tree nodes with Int id's, and walk around them with zippers (using Data.Tree and Data.Tree.Zipper is a good idea because there's no need to reinvent the wheel). You can then attach auxiliary attributes to the nodes using a Data.IntMap to map node id's to whatever you want. In particular, you can create an IntMap to map from a node id to the TreePos Full Int for that node so you can explore the parent, siblings, and children of that node.

Upvotes: 1

Related Questions