Do threaded binary tree structures offer any advantages in Haskell?

I was reading this mailing list post, where someone had a question about a threaded RB tree in Haskell, and the end of the response said:

I suggest you (Lex) either go imperative (with STRef or IORef) or do without threading, unless you're sure that you'll be doing many more lookups and traversals than inserts and deletes.

Implying that, although creating threaded trees in Haskell is generally not a good idea, it does still make lookups and traversals more efficient without resorting to imperative algorithms.

However, I can't think of a way threads could make haskell trees more efficient without using imperative constructs. Is it even possible?

Upvotes: 2

Answers (1)

John L

Reputation: 28097

However, I can't think of a way threads could make haskell trees more efficient without using imperative constructs. Is it even possible?

Technically, exactly the same benefits that apply to an imperative threaded tree also apply to a persistent threaded tree. However, due to some extra costs of such a data structure, it's not always a practical choice.

Consider the situation where you have a tree that won't be modified, however you'll frequently need to make linear traversals (e.g. find a node and the subsequent n nodes, or a full linear traversal, etc). In an imperative language, a threaded tree can be more efficient in this case than a non-threaded tree because linear traversals can be performed directly, without keeping a stack. It should be clear that this is exactly the same case as with a persistent structure because of our assumption that the tree won't be modified, so clearly linear traversals of the threaded tree would be more efficient in the persistent structure too.

So, what are the downsides of a persistent threaded tree? First, inserts/deletes will be much more expensive than in an ordinary tree, because every node prior to the modified node will also need to be recreated. So the structure will only be beneficial when mutations are rare or nonexistent. But in that case, you're probably better off creating an array from the tree and traversing that (unless you want to lookup a starting position). So it ends up being a rather complicated data structure that would only be used in very limited circumstances. But for that very specific use case, it would be more efficient than a plain binary tree.

Edit: here's an example of how to implement a threaded binary tree purely. Implementation of deletions is left as an exercise, no attempt has been made to keep the tree balanced, and I make no promises about correctness. But after building up a Tree using Prelude.foldl Threaded.insert Threaded.empty, both Data.Foldable.toList and foldThread (:[]) return the same list, so it's probably pretty close to correct.

{-# LANGUAGE DeriveFoldable #-}
module Threaded where

import Control.Applicative
import Control.Monad
import Data.Foldable (Foldable (..))
import Data.Monoid

newtype Tree a = Tree {unTree :: Maybe (NonNullTree a) }
  deriving (Eq, Foldable)

-- its a little easier to work with non-null trees.
data NonNullTree a = Bin (Link a) a (Link a)

data Link a =
    Normal (NonNullTree a)  -- a child branch
  | Thread (NonNullTree a)  -- a thread to another branch
  | Null                    -- left child of min value, or right child of max value
  -- N.B. don't try deriving type class instances, such as Eq or Show.  If you derive
  -- them, many of the derived functions will be infinite loops.  If you want instances
  -- for Show or Eq, you'll have to write them by hand and break the loops by
  -- not following Thread references.    

empty :: Tree a
empty = Tree Nothing

singleton :: a -> Tree a
singleton a = Tree . Just $ Bin Null a Null

instance Foldable NonNullTree where
    foldMap f (Bin l a r) = mconcat [foldMap f l, f a, foldMap f r]

-- when folding, we only want to follow actual children, not threads.
-- Using this instance, we can compare with folding via threads.
instance Foldable Link where
    foldMap f (Normal t) = foldMap f t
    foldMap f _ = mempty

-- |find the first value in the tree >= the search term
-- O(n) complexity, we can do better!
tlookup :: Ord a => Tree a -> a -> Maybe a
tlookup tree needle = getFirst $ foldMap search tree
  where
    search a = if a >= needle then First (Just a) else mempty

-- | fold over the tree by following the threads.  The signature matches `foldMap` for easy
-- comparison, but `foldl'` or `traverse` would likely be more common operations.
foldThread :: Monoid m => (a -> m) -> Tree a -> m
foldThread f (Tree (Just root)) = deep mempty root
  where
    -- descend to the leftmost child, then follow threads to the right.
    deep acc (Bin l a r) = case l of
        Normal tree -> deep acc tree
        _ -> follow (acc `mappend` f a) r
    follow acc (Normal tree) = deep acc tree
    -- in this case we know the left child is a thread pointing to the
    -- current node, so we can ignore it.
    follow acc (Thread (Bin _ a r)) = follow (acc `mappend` f a) r
    follow acc Null = acc

-- used internally.  sets the left child of the min node to the 'prev0' link,
-- and the right child of the max node to the 'next0' link.
relinkEnds :: Link a -> Link a -> NonNullTree a -> NonNullTree a
relinkEnds prev0 next0 root = case go prev0 next0 root of
    Normal root' -> root'
    _ -> error "relinkEnds: invariant violation"
  where
    go prev next (Bin l a r) =
        -- a simple example of knot-tying.
        -- * l' depends on 'this'
        -- * r' depends on 'this'
        -- * 'this' depends on both l' and r'
        -- the whole thing works because Haskell is lazy, and the recursive 'go'
        -- function never actually inspects the 'prev' and 'next' arguments.
        let l' = case l of
                Normal lTree -> go prev (Thread this) lTree
                _ -> prev
            r' = case r of
                Normal rTree -> go (Thread this) next rTree
                _ -> next
            this = Bin l' a r'
        in Normal this

-- | insert a value into the tree, overwriting it if already present.
insert :: Ord a => Tree a -> a -> Tree a
insert (Tree Nothing) a = singleton a
insert (Tree (Just root)) a = case go Null Null root of
    Normal root' -> Tree $ Just root'
    _ -> error "insert: invariant violation"
  where
    go prev next (Bin l val r) = case compare a val of
        LT ->
            -- ties a knot similarly to the 'relinkEnds' function.
            let l' = case l of
                      Normal lTree -> go prev thisLink lTree
                      _ -> Normal $ Bin prev a thisLink
                r' = case r of
                      Normal rTree -> Normal $ relinkEnds thisLink next rTree
                      _ -> next
                this = Bin l' val r'
                thisLink = Thread this
            in Normal this
        EQ ->
            let l' = case l of
                      Normal lTree -> Normal $ relinkEnds prev thisLink lTree
                      _ -> prev
                r' = case r of
                      Normal rTree -> Normal $ relinkEnds thisLink next rTree
                      _ -> next
                this = Bin l' a r'
                thisLink = Thread this
            in Normal this
        GT ->
            let l' = case l of
                      Normal lTree -> Normal $ relinkEnds prev thisLink lTree
                      _ -> prev
                r' = case r of
                      Normal rTree -> go thisLink next rTree
                      _ -> Normal $ Bin thisLink a next
                this = Bin l' val r'
                thisLink = Thread this
            in Normal this

Upvotes: 1

Do threaded binary tree structures offer any advantages in Haskell?

Answers (1)

Related Questions