Reputation: 3145
Haskell beginner here: Inorder traversal of a binary tree is simple, e.g.:
data IntegerTree = Leaf Integer
| Node IntegerTree Integer IntegerTree
inorder :: IntegerTree -> [Integer]
inorder (Leaf n) = [n]
inorder (Node l n r) = inorder l ++ [n] ++ inorder r
However, it seems to me that there must a more efficient implementation. Since lists are singly-linked, concatenating inorder l
and [n]
seems wasteful, especially since this operating is performed many times for a large tree. Can I avoid this problem by writing the same function differently?
I initially thought about this while trying to solve the towers of hanoi puzzle where a move list is constructed in a similar fashion and I expect that many problems can be solved with similar recursive algorithms.
Upvotes: 4
Views: 241
Reputation: 116139
Nested ++
, as the ones you have in your in-order visit, can be inefficient. This is because list1 ++ list2
will copy (the spine of) list1
, as follows:
(1:2:3:[]) ++ list2
=
1: ((2:3:[]) ++ list2)
=
1:2: ((3:[]) ++ list2)
=
1:2:3: ([] ++ list2)
=
1:2:3:list2
Copying the first list might not be so bad if done once, but when we nest ++
as in
((list1 ++ list2) ++ list3) ++ list4
we copy the copy of the copy, which slows everything down, often making O(N^2) something that should be O(N).
When computing list1 ++ list2
, a key idea is this one: if we only could keep a "pointer" to the end []
inside list1
, we could avoid the copy, and simply rewrite it with a pointer to list2
, we would obtain constant-time (destructive) append.
Now, do we have imperative-style mutability in Haskell? Not for regular lists. We could, however, transform out lists into "functions of the end", i.e. instead of writing
1:2:3:[]
for a list, we could write
\end -> 1:2:3:end
to represent the same data. The latter representation of lists is called a "difference list". Converting from regular to difference lists is simple:
type DList a = [a] -> [a]
toDList :: [a] -> DList a
toDlist = (++)
fromDlist :: DList a -> [a]
fromDlist dl = dl []
So far so good, but how to concatenate two DList a
? We need to take something like
list1 = \end -> 1:2:3:end
list2 = \end -> 4:5:6:7:end
and return
concatenate list1 list2 = \end -> 1:2:3:4:5:6:7:end
It turns out that concatenate
is simply function composition, (.)
. I.e. list1 . list2
is exactly the concatenation we need. When we evaluate
fromDList (list1 . list2)
-- i.e.
(list1 . list2) []
no copy is done, since the end of list1
is immediately linked to list2
.
So, with this in mind, we can rewrite your code with minimal changes:
inorder :: IntegerTree -> [Integer]
inorder tree = fromDList (inorderDL tree)
inorderDL :: IntegerTree -> DList Integer
inorderDL (Leaf n) = (n :) -- that is, \end -> n: end
inorderDL (Node l n r) = inorderDL l . (n :) . inorderDL r
This will make no list copies, since when each sublist is generated at each recursive step, its tail won't be []
, but the "pointer" to the next list to concatenate.
Finally, note that this approach, under a different light, is exactly what Willem used in his answer.
Upvotes: 5
Reputation: 476557
You can pass an extra parameter which is the tail: a list of items after that item that still will be generated. This then looks like:
inorder :: IntegerTree -> [Integer]
inorder = go []
where go tl (Leaf n) = n : tl
go tl (Node l n r) = go (n : go tl r) l
Here tl
is thus a list of elements that will follow after the end of the node. At the top level, that list is empty (hence the go []
). When we see a leaf, we emit the item n
wrapped in the Leaf
, and then followed by the tail.
For the Node
we thus will perform recursion on the left element, with as tail n : go tl r
, this is thus the item of that node followed by the recursion on the right subtree where we use the given tail tl
again.
Upvotes: 4