ljedrz
ljedrz

Reputation: 22173

Haskell Fibonacci sequence performance depending on methodology

I was trying out different approaches to getting a number at a given index of the Fibonacci sequence and they could basically be divided into two categories:

I picked an example of both:

fibs1 :: Int -> Integer
fibs1 n = fibs1' !! n
    where fibs1' = 0 : scanl (+) 1 fibs1'

fib2 :: Int -> Integer
fib2 n = fib2' 1 1 n where
    fib2' _ b 2 = b
    fib2' a b n = fib2' b (a + b) (n - 1)

fibs1:

real    0m2.356s
user    0m2.310s
sys     0m0.030s

fibs2:

real    0m0.671s
user    0m0.667s
sys     0m0.000s

Both were compiled with 64bit GHC 7.6.1 and -O2 -fllvm. Their core dumps are very similar in length, but they differ in the parts that I'm not very proficient at interpreting.

I was not surprised that fibs1 failed for n = 350000 (Stack space overflow). However, I am not comfortable with the fact that it used that much memory.

I would like to clear some things up:

  1. Why does the GC not take care of the beginning of the list throughout computation even though most of it quickly becomes useless?
  2. Why does GHC not optimize the list version to a variable version since only two of its elements are required at once?

EDIT: Sorry, I mixed the speed results, fixed. Two of three of my doubts are still valid, though ;).

Upvotes: 2

Views: 287

Answers (1)

Daniel Fischer
Daniel Fischer

Reputation: 183858

Why does the GC not take care of the beginning of the list throughout computation even though most of it quickly becomes useless?

fibs1 uses a lot of memory and is slow because scanl is lazy, it doesn't evaluate the list elements, so

fibs1' = 0 : scanl (+) 1 fibs1'

produces

0 : scanl (+) 1 (0 : more)
0 : 1 : let f2 = 1+0 in scanl (+) f2 (1 : more')
0 : 1 : let f2 = 1+0 in f2 : let f3 = f2+1 in scanl (+) f3 (f2 : more'')
0 : 1 : let f2 = 1+0 in f2 : let f3 = f2+1 in f3 : let f4 = f3+f2 in scanl (+) f4 (f3 : more''')

etc. So you rather quickly get a huge nested thunk. When that thunk is evaluated, it is pushed on the stack, and at some point between 250000 and 350000, it becomes too big for the default stack.

And since each list element holds a reference to the previous while it is not evaluated, the beginning of the list cannot be garbage-collected.

If you use a strict scan,

fibs1 :: Int -> Integer
fibs1 n = fibs1' !! n
  where
    fibs1' = 0 : scanl' (+) 1 fibs1'
    scanl' f a (x:xs) = let x' = f a x in x' `seq` (a : scanl' f x' xs)
    scanl' _ a [] = [a]

when the k-th list cell is produced, its value is already evaluated, so doesn't refer to a previous, hence the list can be garbage collected (assuming nothing else holds a reference to it) as it is traversed.

With that implementation, the list version is about as fast and lean as fib2 (it needs to allocate list cells nevertheless, so it allocates a small bit more, and is possibly a tiny bit slower therefore, but the difference is minute, since the Fibonacci numbers become so large that the list construction overhead becomes negligible).

The idea of scanl is that its result is incrementally consumed, so that the consumption forces the elements and prevents the build-up of large thunks.

Why does GHC not optimize the list version to a variable version since only two of its elements are required at once?

Its optimiser can't see through the algorithm to determine that. scanl is opaque to the compiler, it doesn't know what scanl does.

If we take the exact source code for scanl (renaming it or hiding scanl from the Prelude, I opted for renaming),

scans                   :: (b -> a -> b) -> b -> [a] -> [b]
scans f q ls            =  q : (case ls of
                                []   -> []
                                x:xs -> scans f (f q x) xs)

and compile the module exporting it (with -O2), and then look at the generated interface file with

ghc --show-iface Scan.hi

we get (for example, minor differences between compiler versions)

Magic: Wanted 33214052,
       got    33214052
Version: Wanted [7, 0, 6, 1],
         got    [7, 0, 6, 1]
Way: Wanted [],
     got    []
interface main:Scan 7061
  interface hash: ef57dac14815e2f1f897b42a007c0c81
  ABI hash: 8cfc8dab79de6a51fcad666f1869574f
  export-list hash: 57d6805e5f0b5f76f0dd8dfb228df988
  orphan hash: 693e9af84d3dfcc71e640e005bdc5e2e
  flag hash: 1e8135cb44ef6dd330f1f56943d1f463
  used TH splices: False
  where
exports:
  Scan.scans
module dependencies:
package dependencies: base* ghc-prim integer-gmp
orphans: base:GHC.Base base:GHC.Float base:GHC.Real
family instance modules:
import  -/  base:Prelude 1cb4b618cf45281dc97748b1831bf0cd
d79ca4e223c0de0a770a3b88a5e67687
  scans :: forall b a. (b -> a -> b) -> b -> [a] -> [b]
    {- Arity: 3, HasNoCafRefs, Strictness: LLL -}
vectorised variables:
vectorised tycons:
vectorised reused tycons:
scalar variables:
scalar tycons:
trusted: safe-inferred
require own pkg trusted: False

and see that the interface file doesn't expose the unfolding of the function, only its type, arity, strictness and that it doesn't refer to CAFs.

When a module importing that is compiled, all that the compiler has to go by is the information exposed by the interface file.

Here, there is no information exposed that would allow the compiler to do anything else but emit a call to the function.

If the unfolding were exposed, the compiler had a chance to inline the unfolding and analyse the code knowing the types and combination function to produce more eager code that doesn't build thunks.

The semantics of scanl, however, are maximally lazy, each element of the output is emitted before the input list is inspected. That has the consequence that GHC can't make the addition strict, since that would change the result if the list contained any undefined values:

scanl (+) 1 [undefined] = 1 : scanl (+) (1 + undefined) [] = 1 : (1 + undefined) : []

while

scanl' (+) 1 [undefined] = let x' = 1 + undefined in x' `seq` 1 : scanl' (+) x' []
                         = *** Exception: Prelude.undefined

One could make a variant

scanl'' f b (x:xs) = b `seq` b : scanl'' f (f b x) xs

that would produce 1 : *** Exception: Prelude.undefined for the above input, but any strictness would indeed change the result if the list contained undefined values, so even if the compiler knew the unfolding, it couldn't make the evaluation strict - unless it could prove that there are no undefined values in the list, a fact that is obvious to us, but not the compiler [and I don't think it would be easy to teach a compiler recognize that and be able to prove the absence of undefined values].

Upvotes: 4

Related Questions