Pufan Jiang
Pufan Jiang

Reputation: 178

Balanced tree for functional symbol table

I'm doing exercises of "Modern Compiler Implementation in ML" (Andrew Appel). One of which (ex 1.1 d) is to recommend a balanced-tree data structure for functional symbol table. Appeal mentioned such data structure should rebalance on insertion but not on lookup. Being totally new to functional programming, I found this confusing. What is key insight on this requirement?

Upvotes: 1

Views: 182

Answers (2)

sshine
sshine

Reputation: 16105

Since Davislor already answered your question extensively, here are mostly some implementation hints. I would add that choice of data structure for your symbol table is probably not relevant for a toy compiler. Compilation time only starts to become an issue when you compiler is used on a lot of code and the code is recompiled often.

Sticking to a O(n) insert/lookup data structure is fine in practice until it isn't.

Signature-wise, all you want is a key-value mapping, insert, and lookup:

signature SymTab =
sig
  type id
  type value
  type symtab

  val empty : symtab
  val insert : id -> value -> symtab -> symtab
  val lookup : id -> symtab -> value option
end

A simple O(n) implementation with lists might be:

structure ListSymTab : SymTab =
struct
  type id = string
  type value = int
  type symtab = (id * value) list

  val empty = []
  fun insert id value [] = [(id, value)]
    | insert id value ((id',value')::symtab) =
        if id = id'
          then (id,value)::symtab
          else (id',value')::insert id value symtab
  fun lookup _ [] = NONE
    | lookup id ((id',value)::symtab) =
        if id = id' then SOME value else lookup id symtab
end

You might use it like:

- ListSymTab.lookup "hello" (ListSymTab.insert "hello" 42 ListSymTab.empty);
> val it = SOME 42 : int option

Then again, maybe your symbol table doesn't map strings to integers, or you may have one symbol table for variables and one for functions.

You could parameterise the id/value types using a functor:

functor ListSymTabFn (X : sig
                            eqtype id
                            type value
                          end) : SymTab =
struct
  type id = X.id
  type value = X.value
  (* The rest is the same as ListSymTab. *)
end

And you might use it like:

- structure ListSymTab = ListSymTabFn(struct type id = string type value = int end);
- ListSymTab.lookup "world" (ListSymTab.insert "hello" 42 ListSymTab.empty);
> val it = NONE : int option

All you need for a list-based symbol table is that the identifiers/symbols can be compared for equality. For your balanced-tree symbol table, you need identifiers/symbols to be orderable.

Instead of implementing balanced trees from scratch, look e.g. at SML/NJ's RedBlackMapFn:

To create a structure implementing maps (dictionaries) over a type T [...]:

structure MapT = RedBlackMapFn (struct
                                  type ord_key = T
                                  val compare = compareT
                                end)

Try this example with T as string and compare as String.compare:

$ sml
Standard ML of New Jersey v110.76 [built: Sun Jun 29 03:29:51 2014]
- structure MapS = RedBlackMapFn (struct
                                    type ord_key = string
                                    val compare = String.compare
                                  end);
[autoloading]
[library $SMLNJ-BASIS/basis.cm is stable]
[library $SMLNJ-LIB/Util/smlnj-lib.cm is stable]
[autoloading done]
structure MapS : ORD_MAP?
- open MapS;
...

Opening the structure is an easy way to explore the available functions and their types.

We can then create a similar functor to ListSymTabFn, but one that takes an additional compare function:

functor RedBlackSymTabFn (X : sig
                                type id
                                type value
                                val compare : id * id -> order
                              end) : SymTab =
struct
  type id = X.id
  type value = X.value

  structure SymTabX = RedBlackMapFn (struct
                                       type ord_key = X.id
                                       val compare = X.compare
                                     end)

  (* The 'a map type inside SymTabX maps X.id to anything. *)
  (* We are, however, only interested in mapping to values. *)
  type symtab = value SymTabX.map

  (* Use other stuff in SymTabT for empty, insert, lookup. *)
end

Finally, you can use this as your symbol table:

structure SymTab = RedBlackSymTabFn(struct
                                      type id = string
                                      type value = int
                                      val compare = String.compare
                                    end);

Upvotes: 1

Davislor
Davislor

Reputation: 15134

A tree that’s rebalanced on every insertion and deletion doesn’t need to rebalance on lookup, because lookup doesn’t modify the structure. If it was balanced before a lookup, it will stay balanced during and after.

In functional languages, insertion and rebalancing can be more expensive than in a procedural one. Because you can’t alter any node in place, you replace a node by creating a new node, then replacing its parent with a new node whose children are the new daughter and the unaltered older daughter, and then replace the grandparent node with one whose children are the new parent and her older sister, and so on up. You finish when you create a new root node for the updated tree and garbage-collect all the nodes you replaced. However, some tree structures have the desirable property that they need to replace no more than O(log N) nodes of the tree on an insertion and can re-use the rest. This means that the rotation of a red-black tree (for example) has not much more overhead than an unbalanced insertion.

Also, you will typically need to query a symbol table much more often than you update it. It therefore becomes less tempting to try to make insertion faster: if you’re inserting, you might as well rebalance.

The question of which self-balancing tree structure is best for a functional language has been asked here, more than once.

Upvotes: 3

Related Questions