Peter
Peter

Reputation: 3145

Avoid list concatenation during in-order binary tree traversal

Haskell beginner here: Inorder traversal of a binary tree is simple, e.g.:

data IntegerTree = Leaf Integer
                 | Node IntegerTree Integer IntegerTree

inorder :: IntegerTree -> [Integer]
inorder (Leaf n)     = [n]
inorder (Node l n r) = inorder l ++ [n] ++ inorder r

However, it seems to me that there must a more efficient implementation. Since lists are singly-linked, concatenating inorder l and [n] seems wasteful, especially since this operating is performed many times for a large tree. Can I avoid this problem by writing the same function differently?

I initially thought about this while trying to solve the towers of hanoi puzzle where a move list is constructed in a similar fashion and I expect that many problems can be solved with similar recursive algorithms.

Upvotes: 4

Views: 241

Answers (2)

chi
chi

Reputation: 116139

Nested ++, as the ones you have in your in-order visit, can be inefficient. This is because list1 ++ list2 will copy (the spine of) list1, as follows:

(1:2:3:[]) ++ list2
=
1: ((2:3:[]) ++ list2)
=
1:2: ((3:[]) ++ list2)
=
1:2:3: ([] ++ list2)
=
1:2:3:list2

Copying the first list might not be so bad if done once, but when we nest ++ as in

((list1 ++ list2) ++ list3) ++ list4

we copy the copy of the copy, which slows everything down, often making O(N^2) something that should be O(N).

When computing list1 ++ list2, a key idea is this one: if we only could keep a "pointer" to the end [] inside list1, we could avoid the copy, and simply rewrite it with a pointer to list2, we would obtain constant-time (destructive) append.

Now, do we have imperative-style mutability in Haskell? Not for regular lists. We could, however, transform out lists into "functions of the end", i.e. instead of writing

1:2:3:[]

for a list, we could write

\end -> 1:2:3:end

to represent the same data. The latter representation of lists is called a "difference list". Converting from regular to difference lists is simple:

type DList a = [a] -> [a]

toDList :: [a] -> DList a
toDlist = (++)

fromDlist :: DList a -> [a]
fromDlist dl = dl []

So far so good, but how to concatenate two DList a? We need to take something like

list1 = \end -> 1:2:3:end
list2 = \end -> 4:5:6:7:end

and return

concatenate list1 list2 = \end -> 1:2:3:4:5:6:7:end

It turns out that concatenate is simply function composition, (.). I.e. list1 . list2 is exactly the concatenation we need. When we evaluate

fromDList (list1 . list2)
-- i.e.
(list1 . list2) []

no copy is done, since the end of list1 is immediately linked to list2.

So, with this in mind, we can rewrite your code with minimal changes:

inorder :: IntegerTree -> [Integer]
inorder tree = fromDList (inorderDL tree)

inorderDL :: IntegerTree -> DList Integer
inorderDL (Leaf n)     = (n :)               -- that is, \end -> n: end
inorderDL (Node l n r) = inorderDL l . (n :) . inorderDL r

This will make no list copies, since when each sublist is generated at each recursive step, its tail won't be [], but the "pointer" to the next list to concatenate.

Finally, note that this approach, under a different light, is exactly what Willem used in his answer.

Upvotes: 5

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476557

You can pass an extra parameter which is the tail: a list of items after that item that still will be generated. This then looks like:

inorder :: IntegerTree -> [Integer]
inorder = go []
    where go tl (Leaf n) = n : tl
          go tl (Node l n r) = go (n : go tl r) l

Here tl is thus a list of elements that will follow after the end of the node. At the top level, that list is empty (hence the go []). When we see a leaf, we emit the item n wrapped in the Leaf, and then followed by the tail.

For the Node we thus will perform recursion on the left element, with as tail n : go tl r, this is thus the item of that node followed by the recursion on the right subtree where we use the given tail tl again.

Upvotes: 4

Related Questions