Reputation: 41919

Understanding Haskell Parallel Fibonacci Results

Given:

module Main where

import Control.Parallel.Strategies
import Control.Applicative

--main :: IO ()
--main = putStrLn . show $ spark [1..40]

main :: IO ()
main = putStrLn . show . runEval $ splitIt [1..40]

fib :: Int -> Int
fib x 
 | x <= 1    = 1
 | otherwise = fib (x-1) + fib (x-2)


spark :: [Int] -> [Int]
spark = parMap rpar  fib 

splitIt :: [Int] -> Eval [Int]
splitIt xs = let len = length xs
                 (as, bs) = splitAt (len `div` 2) xs
             in
              do
                xs <- fibPar as
                ys <- fibPar bs
                return $ xs ++ ys

fibPar :: [Int] -> Eval [Int]
fibPar []     = return []
fibPar (x:xs) = do
    a  <- rpar $ fib x
    as <- fibPar xs
    return $ a : as

I wrote two ways to calculate fibonacci for each element of [1..40]. Taking from Parallel and Concurrent Programming in Haskell, I ran fibonacci in parallel two ways:

(1) use parMap on the entire list. (firstmain) (2) cut list in half, splitting up each work with rpar (second main)

From reading the aforementioned text, I would've expected #1 to be faster:

This illustrates an important principle when parallelizing code: Try to avoid partitioning the work into a small, fixed number of chunks.

I compiled and ran both (only including 1 main, commenting out the other) via:

compile - ghc -O2 Fib.hs -threaded -rtsopts -eventlog
run - .\Fib.exe +RTS -N2 -s

Here are the results for (1) and (2), respectively:

(1) - use parMap

 Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1         2 colls,     1 par    0.000s   0.000s     0.0001s    0.0001s

  Parallel GC work balance: 84.39% (serial 0%, perfect 100%)

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)

  SPARKS: 80 (74 converted, 0 overflowed, 0 dud, 0 GC'd, 6 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    8.594s  (  4.331s elapsed)
  GC      time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    8.594s  (  4.332s elapsed)

  Alloc rate    12,259 bytes per MUT second

  Productivity 100.0% of total user, 198.4% of total elapsed

(2) - split list + use rpar on each half

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1         2 colls,     1 par    0.000s   0.000s     0.0002s    0.0003s

  Parallel GC work balance: 12.41% (serial 0%, perfect 100%)

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)

  SPARKS: 40 (10 converted, 0 overflowed, 0 dud, 0 GC'd, 30 fizzled)

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    7.453s  (  3.751s elapsed)
  GC      time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    7.453s  (  3.752s elapsed)

  Alloc rate    14,398 bytes per MUT second

  Productivity 100.0% of total user, 198.6% of total elapsed

Why wasn't, as I understand the text to hint at, the parMap version faster than the split up + rpar version?

Upvotes: 2

Answers (2)

Alec

Reputation: 32319

I'm not sure if what follows is the only thing that is affecting the timing, but it definitely plays a big role.

With a list, it is inefficient to split the work like this

Remember that starting an execution in parallel takes very little work, so the fastest way to start executing something for each element in the list is just to run through them one after another and spark them with rpar. That's what parMap does.

In your case, splitAt is much more work: it needs to traverse half the list, then allocate space for another list. You might as well have sparked the fib execution during this traversal instead.

To see what I mean, try replacing [1..40] with (replicate 1000 35). This is much more parallelizable: lots of reasonably difficult problems, all the same difficulty. With a 1000 element long list, splitIt runs in over 100 seconds while spark runs in under 1 second. Your solution ends up spending the vast majority of its time splitting and appending the lists rather than computing anything.

Upvotes: 0

ErikR

Reputation: 52049

First note that the work needed to compute fib n is exponential. That means that computing map fib [1..n] takes about the same amount of time as computing fib (n+1). To see this just print out the time it takes to compute fib n for various values of n:

import System.TimeIt
import Control.Monad
...
main = forM_ [1..40] $ \n -> timeIt $ print (fib n)

To compute map fib [1..40] efficiently with two threads you want to equalize the amount of work done by each thread as much as possible. It turns out that one such division of labor which works pretty well is to have one thread compute map fib [1..38] and the other compute [fib 39, fib 40].

If you create a spark for each fib i computation, the division of labor between the two threads is completely non-deterministic. To equalize the work done by each thread you actually want to carefully craft what the sparks are.

Now look at the number of sparks created in your two programs - 80 for one and 40 for the other. So clearly each fib i is getting sparked which means that in both cases the fib i computations are getting assigned randomly to the two threads.

Here is a way of getting a speedup of about 1.5 with two threads:

import Control.Parallel.Strategies

fib :: Int -> Int
fib x 
 | x <= 1    = 1
 | otherwise = fib (x-1) + fib (x-2)

main = do
  let fs = (map fib [1..40]) `using` parListSplitAt 38 rdeepseq rdeepseq
  print fs

If you look at the RTS summary you'll see that it only creates two sparks - one for map fib [1..38] and the other for map fib [39,40].

About the 80 sparks... if you use parMap rseq instead of parMap rpar the number of sparks created drops down to 40. So clearly parMap rpar is creating a spark which just creates another spark which is completely redundant. In general I would stick to rdeepseq as an evaluation strategy - it's just simpler, easier to reason about and less error-prone.

Upvotes: 3

Understanding Haskell Parallel Fibonacci Results

Answers (2)

Related Questions