Reputation: 533
I am curious about the runtime performance of an infinite list like the one below:
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
This will create an infinite list of the fibonacci sequence.
My question is that if I do the following:
takeWhile (<5) fibs
how many times does fibs
evaluate each term in the list? It seems
that since takeWhile
checks the predicate function for each item in
the list, the fibs
list will evaluate each term multiple times. The
first 2 terms are given for free. When takeWhile
wants to evaluate
(<5)
on the 3rd element, we will get:
1 : 1 : zipWith (+) [(1, 1), (1)] => 1 : 1 : 3
Now, once takeWhile
wants to evaluate (<5)
on the 4th element: the
recursive nature of fibs
will construct the list again like the
following:
1 : 1 : zipWith (+) [(1, 2), (2, 3)] => 1 : 1 : 3 : 5
It would seem that the 3rd element needs to be computed again when we
want to evaluate the value of the 4th element. Furthermore, if the
predicate in takeWhile
is large, it would indicate the function is
doing more work that is needed since it is evaluating each preceding
element in the list multiple times. Is my analysis here correct or is
Haskell doing some caching to prevent multiple evaluations here?
Upvotes: 43
Views: 4703
Reputation: 183858
Illustration:
module TraceFibs where
import Debug.Trace
fibs :: [Integer]
fibs = 0 : 1 : zipWith tadd fibs (tail fibs)
where
tadd x y = let s = x+y
in trace ("Adding " ++ show x ++ " and " ++ show y
++ "to obtain " ++ show s)
s
Which produces
*TraceFibs> fibs !! 5
Adding 0 and 1 to obtain 1
Adding 1 and 1 to obtain 2
Adding 1 and 2 to obtain 3
Adding 2 and 3 to obtain 5
5
*TraceFibs> fibs !! 5
5
*TraceFibs> fibs !! 6
Adding 3 and 5 to obtain 8
8
*TraceFibs> fibs !! 16
Adding 5 and 8 to obtain 13
Adding 8 and 13 to obtain 21
Adding 13 and 21 to obtain 34
Adding 21 and 34 to obtain 55
Adding 34 and 55 to obtain 89
Adding 55 and 89 to obtain 144
Adding 89 and 144 to obtain 233
Adding 144 and 233 to obtain 377
Adding 233 and 377 to obtain 610
Adding 377 and 610 to obtain 987
987
*TraceFibs>
Upvotes: 31
Reputation: 122429
Think of it this way. The variable fib
is a pointer to a lazy value. (You can think of a lazy value underneath as a data structure like (not real syntax) Lazy a = IORef (Unevaluated (IO a) | Evaluated a)
; i.e. it starts out as unevaluated with a thunk; then when it is evaluated it "changes" to something that remembers the value.) Because the recursive expression uses the variable fib
, they have a pointer to the same lazy value (they "share" the data structure). The first time someone evaluates fib
, it runs the thunk to get the value and that value is remembered. And because the recursive expression points to the same lazy data structure, when they evaluate it, they will see the evaluated value already. As they traverse the lazy "infinite list", there will only be one "partial list" in memory; zipWith
will have two pointers to "lists" which are simply pointers to previous members of the same "list", due to the fact that it started with pointers to the same list.
Note that this is not really "memoizing"; it's just a consequence of referring to the same variable. There is generally no "memoizing" of function results (the following will be inefficient):
fibs () = 0 : 1 : zipWith tadd (fibs ()) (tail (fibs ()))
Upvotes: 4
Reputation: 137947
This is a self-referential, lazy data structure, where "later" parts of the structure refer to earlier parts by name.
Initially, the structure is just a computation with unevaluated pointers back to itself. As it is unfolded, values are created in the structure. Later references to already-computed parts of the structure are able to find the value already there waiting for them. No need to re-evaluate the pieces, and no extra work to do!
The structure in memory begins as just an unevaluated pointer. Once we look at the first value, it looks like this:
> take 2 fibs
(a pointer to a cons cell, pointing at '1', and a tail holding the second '1', and a pointer to a function that holds references back to fibs, and the tail of fibs.
Evaluating one more step expands the structure, and slides the references along:
And so we go unfolding the structure, each time yielding a new unevaluated tail, which is a closure holding references back to 1st and 2nd elements of the last step. This process can continue infinitely :)
And because we're referring to prior values by name, GHC happily retains them in memory for us, so each item is evaluated only once.
Upvotes: 82
Reputation: 26157
When something is evaluated in Haskell, it stays evaluated, as long as it's referenced by the same name1.
In the following code, the list l
is only evaluated once (which might be obvious):
let l = [1..10]
print l
print l -- None of the elements of the list are recomputed
Even if something is partially evaluated, that part stays evaluated:
let l = [1..10]
print $ take 5 l -- Evaluates l to [1, 2, 3, 4, 5, _]
print l -- 1 to 5 is already evaluated; only evaluates 6..10
In your example, when an element of the fibs
list is evaluated, it stays evaluated. Since the arguments to zipWith
reference the actual fibs
list, it means that the zipping expression will use the already partially computed fibs
list when computing the next elements in the list. This means that no element is evaluated twice.
1This is of course not strictly required by the language semantics, but in practice this is always the case.
Upvotes: 19