Reputation: 41919
Given:
module Main where
import Control.Parallel.Strategies
import Control.Applicative
--main :: IO ()
--main = putStrLn . show $ spark [1..40]
main :: IO ()
main = putStrLn . show . runEval $ splitIt [1..40]
fib :: Int -> Int
fib x
| x <= 1 = 1
| otherwise = fib (x-1) + fib (x-2)
spark :: [Int] -> [Int]
spark = parMap rpar fib
splitIt :: [Int] -> Eval [Int]
splitIt xs = let len = length xs
(as, bs) = splitAt (len `div` 2) xs
in
do
xs <- fibPar as
ys <- fibPar bs
return $ xs ++ ys
fibPar :: [Int] -> Eval [Int]
fibPar [] = return []
fibPar (x:xs) = do
a <- rpar $ fib x
as <- fibPar xs
return $ a : as
I wrote two ways to calculate fibonacci for each element of [1..40]
. Taking from Parallel and Concurrent Programming in Haskell, I ran fibonacci in parallel two ways:
(1) use parMap
on the entire list. (firstmain
)
(2) cut list in half, splitting up each work with rpar
(second main
)
From reading the aforementioned text, I would've expected #1 to be faster:
This illustrates an important principle when parallelizing code: Try to avoid partitioning the work into a small, fixed number of chunks.
I compiled and ran both (only including 1 main
, commenting out the other) via:
ghc -O2 Fib.hs -threaded -rtsopts -eventlog
.\Fib.exe +RTS -N2 -s
Here are the results for (1) and (2), respectively:
(1) - use parMap
Tot time (elapsed) Avg pause Max pause
Gen 0 0 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 2 colls, 1 par 0.000s 0.000s 0.0001s 0.0001s
Parallel GC work balance: 84.39% (serial 0%, perfect 100%)
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)
SPARKS: 80 (74 converted, 0 overflowed, 0 dud, 0 GC'd, 6 fizzled)
INIT time 0.000s ( 0.000s elapsed)
MUT time 8.594s ( 4.331s elapsed)
GC time 0.000s ( 0.000s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 8.594s ( 4.332s elapsed)
Alloc rate 12,259 bytes per MUT second
Productivity 100.0% of total user, 198.4% of total elapsed
(2) - split list + use rpar
on each half
Tot time (elapsed) Avg pause Max pause
Gen 0 0 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 2 colls, 1 par 0.000s 0.000s 0.0002s 0.0003s
Parallel GC work balance: 12.41% (serial 0%, perfect 100%)
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)
SPARKS: 40 (10 converted, 0 overflowed, 0 dud, 0 GC'd, 30 fizzled)
INIT time 0.000s ( 0.001s elapsed)
MUT time 7.453s ( 3.751s elapsed)
GC time 0.000s ( 0.000s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 7.453s ( 3.752s elapsed)
Alloc rate 14,398 bytes per MUT second
Productivity 100.0% of total user, 198.6% of total elapsed
Why wasn't, as I understand the text to hint at, the parMap
version faster than the split up + rpar
version?
Upvotes: 2
Views: 223
Reputation: 32319
I'm not sure if what follows is the only thing that is affecting the timing, but it definitely plays a big role.
With a list, it is inefficient to split the work like this
Remember that starting an execution in parallel takes very little work, so the fastest way to start executing something for each element in the list is just to run through them one after another and spark them with rpar
. That's what parMap
does.
In your case, splitAt
is much more work: it needs to traverse half the list, then allocate space for another list. You might as well have sparked the fib
execution during this traversal instead.
To see what I mean, try replacing [1..40]
with (replicate 1000 35)
. This is much more parallelizable: lots of reasonably difficult problems, all the same difficulty. With a 1000 element long list, splitIt
runs in over 100 seconds while spark
runs in under 1 second. Your solution ends up spending the vast majority of its time splitting and appending the lists rather than computing anything.
Upvotes: 0
Reputation: 52049
First note that the work needed to compute fib n
is exponential. That means that computing map fib [1..n]
takes about the same amount of time as computing fib (n+1)
. To see this just print out the time it takes to compute fib n
for various values of n
:
import System.TimeIt
import Control.Monad
...
main = forM_ [1..40] $ \n -> timeIt $ print (fib n)
To compute map fib [1..40]
efficiently with two threads you want to equalize the amount of work done by each thread as much as possible. It turns out that one such division of labor which works pretty well is to have one thread compute map fib [1..38]
and the other compute [fib 39, fib 40]
.
If you create a spark for each fib i
computation, the division of labor between the two threads is completely non-deterministic. To equalize the work done by each thread you actually want to carefully craft what the sparks are.
Now look at the number of sparks created in your two programs - 80 for one and 40 for the other. So clearly each fib i
is getting sparked which means that in both cases the fib i
computations are getting assigned randomly to the two threads.
Here is a way of getting a speedup of about 1.5 with two threads:
import Control.Parallel.Strategies
fib :: Int -> Int
fib x
| x <= 1 = 1
| otherwise = fib (x-1) + fib (x-2)
main = do
let fs = (map fib [1..40]) `using` parListSplitAt 38 rdeepseq rdeepseq
print fs
If you look at the RTS summary you'll see that it only creates two sparks - one for map fib [1..38]
and the other for map fib [39,40]
.
About the 80 sparks... if you use parMap rseq
instead of parMap rpar
the number of sparks created drops down to 40. So clearly parMap rpar
is creating a spark which just creates another spark which is completely redundant. In general I would stick to rdeepseq
as an evaluation strategy - it's just simpler, easier to reason about and less error-prone.
Upvotes: 3