MathematicalOrchid
MathematicalOrchid

Reputation: 62808

Controlling parallel execution

Haskell provides a par combinator, which queues a "spark" for possible evaluation in parallel with the current thread. It also provides a pseq combinator, which forces evaluation of pure code to occur in a specific order.

What Haskell does not appear to provide is a way to generate several sparks and then wait for them all to finish. This is quite trivial to achieve with explicit concurrency, but appears to be impossible with pure sparks.

In part, this is perhaps because of the intended use-case for sparks. They appear to be designed for speculative evaluation. That is, doing work which might be needed, but might not be. Hence, sparks only run on cores which are otherwise idle.

However, this is not my use-case. I have a bunch of results which I know, for a fact, will be needed soon. And if I start trying to process the results before the sparks have fired, I'll just end up single-threaded again with a bunch of fizzled sparks.

Of course, of par waited for sparks to complete, it wouldn't achieve any parallelism! But it would be really useful if there were some way to produce several sparks and then wait for them all to finish. I can't find any way to do that though.

Does anybody have any useful suggestions? (Other than "use explicit concurrency", obviously.)

Upvotes: 12

Views: 567

Answers (2)

gereeter
gereeter

Reputation: 4761

The Really Short Answer

You can't.

The Short Answer

To wait for a spark to finish, try to evaluate what the spark was evaluating. For example, if you have expressions a and b, to calculate a + b, you can do

a `par` b `par` (a + b)

or

a `par` b `pseq` (a + b)

The Long Answer

When you create a spark by using par, you are telling the run time system "I will need this value later, so you should evaluate it in parallel." When you later need the value, either the spark has evaluated the expression or it hasn't. If it has, the thunk will be replaced by a value, and so re-evaluation has no cost - it is just fetching the value. If it hasn't been evaluated be the spark then waiting for the spark is useless - it might take a while to get scheduled, and the thread waiting is wasting time. Instead of waiting, you should just evaluate the expression yourself. Essentially, there is no need to wait for a spark. You just try to evaluate the original expression and get performance benifits.

Also, a note on speculation - although sparks can be and often are used for speculation, that is not completely what they are designed for. I see par being used for simple parallelization, as in pfib below, much more often than I see it used for speculation.

Examples

A standard example is parallelizing the Fibonacci numbers, from the serial

fib 0 = 0
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)

to the parallel

pfib 0 = 0
pfib 1 = 1
pfib n = l `par` r `pseq` (l + r) where
    l = pfib (n - 1)
    r = pfib (n - 2)

.

Now for an example using speculation:

spec :: a -- a guess to the input value
    -> (a -> b) -- a function to tranform the input value
    -> a -- the actual input value - this will require computation
    -> b -- the function applied to the input value
spec guess f input = let speculation = f guess in speculation `par`
    if guess == input
        then speculation
        else f input

The hackage package I got this from, speculation, actually had a couple optimizations like not doing this on a single core and checking whether the input was already evaluated, but that doesn't matter to the working of the function.

Other Solutions that Make Things More Explicit

  • monad-par
  • Strategies, which make use of par.
  • Messing with IO. There are a lot of things here.

Upvotes: 6

Daniel Fischer
Daniel Fischer

Reputation: 183873

You can try putting the results of the sparked computations into a strict data structure

{-# LANGUAGE BangPatterns #-}
module Main where

import Control.Parallel

fib :: Int -> Int
fib n
    | n < 1     = 0
    | n == 1    = 1
    | otherwise = fib (n-1) + fib (n-2)

trib :: Int -> Int
trib n
    | n < 1     = 0
    | n < 3     = 1
    | otherwise = trib (n-1) + trib (n-2) + trib (n-3)

data R = R { res1, res2 :: !Int }

main :: IO ()
main = do
    let !r = let a = fib 38
                 b = trib 31
             in a `par` b `pseq` (R a b)
    print $ res1 r
    print $ fib 28
    print $ res2 r

That worked here:

$ ./spark +RTS -N2 -s
39088169
317811
53798080
          65,328 bytes allocated in the heap
           9,688 bytes copied during GC
           5,488 bytes maximum residency (1 sample(s))
          30,680 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0         1 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s
  Gen  1         1 colls,     1 par    0.00s    0.00s     0.0001s    0.0001s

  Parallel GC work balance: 1.33 (686 / 515, ideal 2)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.59s    (  0.59s)       0.00s    (  0.00s)
  Task  1 (worker) :    0.00s    (  0.59s)       0.00s    (  0.00s)
  Task  2 (bound)  :    0.59s    (  0.59s)       0.00s    (  0.00s)
  Task  3 (worker) :    0.00s    (  0.59s)       0.00s    (  0.00s)

  SPARKS: 1 (1 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    1.17s  (  0.59s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    1.18s  (  0.59s elapsed)

  Alloc rate    55,464 bytes per MUT second

  Productivity  99.9% of total user, 199.1% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

Upvotes: 6

Related Questions