akg
akg

Reputation: 533

Lazy evaluation of terms in an infinite list in Haskell

I am curious about the runtime performance of an infinite list like the one below:

fibs = 1 : 1 : zipWith (+) fibs (tail fibs)

This will create an infinite list of the fibonacci sequence.

My question is that if I do the following:

takeWhile (<5) fibs

how many times does fibs evaluate each term in the list? It seems that since takeWhile checks the predicate function for each item in the list, the fibs list will evaluate each term multiple times. The first 2 terms are given for free. When takeWhile wants to evaluate (<5) on the 3rd element, we will get:

1 : 1 : zipWith (+) [(1, 1), (1)] => 1 : 1 : 3

Now, once takeWhile wants to evaluate (<5) on the 4th element: the recursive nature of fibs will construct the list again like the following:

1 : 1 : zipWith (+) [(1, 2), (2, 3)] => 1 : 1 : 3 : 5

It would seem that the 3rd element needs to be computed again when we want to evaluate the value of the 4th element. Furthermore, if the predicate in takeWhile is large, it would indicate the function is doing more work that is needed since it is evaluating each preceding element in the list multiple times. Is my analysis here correct or is Haskell doing some caching to prevent multiple evaluations here?

Upvotes: 43

Views: 4703

Answers (4)

Daniel Fischer
Daniel Fischer

Reputation: 183858

Illustration:

module TraceFibs where

import Debug.Trace

fibs :: [Integer]
fibs = 0 : 1 : zipWith tadd fibs (tail fibs)
  where
    tadd x y = let s = x+y
               in trace ("Adding " ++ show x ++ " and " ++ show y
                                   ++ "to obtain " ++ show s)
                        s

Which produces

*TraceFibs> fibs !! 5
Adding 0 and 1 to obtain 1
Adding 1 and 1 to obtain 2
Adding 1 and 2 to obtain 3
Adding 2 and 3 to obtain 5
5
*TraceFibs> fibs !! 5
5
*TraceFibs> fibs !! 6
Adding 3 and 5 to obtain 8
8
*TraceFibs> fibs !! 16
Adding 5 and 8 to obtain 13
Adding 8 and 13 to obtain 21
Adding 13 and 21 to obtain 34
Adding 21 and 34 to obtain 55
Adding 34 and 55 to obtain 89
Adding 55 and 89 to obtain 144
Adding 89 and 144 to obtain 233
Adding 144 and 233 to obtain 377
Adding 233 and 377 to obtain 610
Adding 377 and 610 to obtain 987
987
*TraceFibs>

Upvotes: 31

newacct
newacct

Reputation: 122429

Think of it this way. The variable fib is a pointer to a lazy value. (You can think of a lazy value underneath as a data structure like (not real syntax) Lazy a = IORef (Unevaluated (IO a) | Evaluated a); i.e. it starts out as unevaluated with a thunk; then when it is evaluated it "changes" to something that remembers the value.) Because the recursive expression uses the variable fib, they have a pointer to the same lazy value (they "share" the data structure). The first time someone evaluates fib, it runs the thunk to get the value and that value is remembered. And because the recursive expression points to the same lazy data structure, when they evaluate it, they will see the evaluated value already. As they traverse the lazy "infinite list", there will only be one "partial list" in memory; zipWith will have two pointers to "lists" which are simply pointers to previous members of the same "list", due to the fact that it started with pointers to the same list.

Note that this is not really "memoizing"; it's just a consequence of referring to the same variable. There is generally no "memoizing" of function results (the following will be inefficient):

fibs () = 0 : 1 : zipWith tadd (fibs ()) (tail (fibs ()))

Upvotes: 4

Don Stewart
Don Stewart

Reputation: 137947

This is a self-referential, lazy data structure, where "later" parts of the structure refer to earlier parts by name.

Initially, the structure is just a computation with unevaluated pointers back to itself. As it is unfolded, values are created in the structure. Later references to already-computed parts of the structure are able to find the value already there waiting for them. No need to re-evaluate the pieces, and no extra work to do!

The structure in memory begins as just an unevaluated pointer. Once we look at the first value, it looks like this:

> take 2 fibs

enter image description here

(a pointer to a cons cell, pointing at '1', and a tail holding the second '1', and a pointer to a function that holds references back to fibs, and the tail of fibs.

Evaluating one more step expands the structure, and slides the references along:

enter image description here

And so we go unfolding the structure, each time yielding a new unevaluated tail, which is a closure holding references back to 1st and 2nd elements of the last step. This process can continue infinitely :)

And because we're referring to prior values by name, GHC happily retains them in memory for us, so each item is evaluated only once.

Upvotes: 82

dflemstr
dflemstr

Reputation: 26157

When something is evaluated in Haskell, it stays evaluated, as long as it's referenced by the same name1.

In the following code, the list l is only evaluated once (which might be obvious):

let l = [1..10]
print l
print l -- None of the elements of the list are recomputed

Even if something is partially evaluated, that part stays evaluated:

let l = [1..10]
print $ take 5 l -- Evaluates l to [1, 2, 3, 4, 5, _]
print l          -- 1 to 5 is already evaluated; only evaluates 6..10

In your example, when an element of the fibs list is evaluated, it stays evaluated. Since the arguments to zipWith reference the actual fibs list, it means that the zipping expression will use the already partially computed fibs list when computing the next elements in the list. This means that no element is evaluated twice.

1This is of course not strictly required by the language semantics, but in practice this is always the case.

Upvotes: 19

Related Questions