Simon
Simon

Reputation: 3110

Does `par` create another thread?

My understanding to par is that it will create a thread in another core to execution.

But I failed proof this understanding with following test code since the result showing seems only one thread is running.

Could you help me to figure out what's wrong here?

import Control.Monad
import Control.Parallel
import Control.Concurrent
import System.IO.Unsafe

fib :: Int -> Int 
fib 0 = 1 
fib 1 = 1 
fib n = (fib (n-1)) + (fib (n - 2))

test :: String -> [Int] -> IO () 
test _ [] = return () 
test name (a:xs) = do 
    tid <- myThreadId 
    print $ (show tid) ++ "==>" ++ (show a) ++ "==>" ++ (show $ fib a) ++ "==>" ++ name
    x `par` y 
  where x = test "2" xs
        y = test "3" (tail xs)

main = test "1" [10..35]

Compiled with:

ghc --make -threaded -rtsopts test-par.hs
time ./test-par +RTS -N2

The result

"ThreadId 3==>10==>89==>1"
"ThreadId 3==>12==>233==>3"
"ThreadId 3==>14==>610==>3"
"ThreadId 3==>16==>1597==>3"
"ThreadId 3==>18==>4181==>3"
"ThreadId 3==>20==>10946==>3"
"ThreadId 3==>22==>28657==>3"
"ThreadId 3==>24==>75025==>3"
"ThreadId 3==>26==>196418==>3"
"ThreadId 3==>28==>514229==>3"
"ThreadId 3==>30==>1346269==>3"
"ThreadId 3==>32==>3524578==>3"
"ThreadId 3==>34==>9227465==>3"

real    0m1.131s
user    0m0.668s
sys 0m0.492s

How many core I have?

cat /proc/cpuinfo | grep processor | wc -l
2

-------------------------------- update

I think this paper by Simon Marlow is a good reference for such newbie question.

Upvotes: 2

Views: 358

Answers (1)

Don Stewart
Don Stewart

Reputation: 137997

No, par does not guarantee to create another thread.

It registers a spark in the runtime, which may cause the computation to be executed in a different thread, depending on the dynamic workload of the machine.

Creating a spark is very cheap, so you can create many (1000x) more than you have threads, and the runtime will just try to keep all your cores busy.

In your case, your x computation is registered as a spark, and then immediately discarded (you never refer to it again). So the garbage collector can remove it.

To parallelize a recursive function, you'll typically want to just use par up to some depth.

An example -- a recursive function with a cutoff depth:

import Control.Parallel
import Control.Monad
import Text.Printf

cutoff = 35

fib' :: Int -> Integer
fib' 0 = 0
fib' 1 = 1
fib' n = fib' (n-1) + fib' (n-2)

fib :: Int -> Integer
fib n | n < cutoff = fib' n
      | otherwise  = r `par` (l `pseq` l + r)
 where
    l = fib (n-1)
    r = fib (n-2)

main = forM_ [0..45] $ \i ->
            printf "n=%d => %d\n" i (fib i)

Run as:

$ ghc -O2 -threaded --make A.hs -rtsopts -fforce-recomp
$ ./A +RTS -N16 -s

Yields a parallel workload 1248.9% over sequential (i.e. 12.48x):

                                   Tot time (elapsed)  Avg pause  Max pause
  Gen  0      6264 colls,  6263 par    5.23s    0.41s     0.0001s    0.0109s
  Gen  1         1 colls,     1 par    0.00s    0.00s     0.0002s    0.0002s

  Parallel GC work balance: 7.09 (1797194 / 253525, ideal 16)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    7.56s    (  9.36s)       0.92s    (  0.89s)
  Task  1 (worker) :    0.12s    ( 10.21s)       0.02s    (  0.05s)
  Task  2 (bound)  :    8.39s    (  9.51s)       0.70s    (  0.74s)
  Task  3 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
  Task  4 (worker) :    7.17s    (  9.60s)       0.53s    (  0.66s)
  Task  5 (worker) :    7.28s    (  9.55s)       0.56s    (  0.71s)
  Task  6 (worker) :    7.48s    (  9.52s)       0.56s    (  0.74s)
  Task  7 (worker) :    7.11s    (  9.54s)       0.66s    (  0.72s)
  Task  8 (worker) :    7.41s    (  9.62s)       0.70s    (  0.64s)
  Task  9 (worker) :    7.69s    (  9.48s)       0.66s    (  0.78s)
  Task 10 (worker) :    7.56s    (  9.51s)       0.56s    (  0.75s)
  Task 11 (worker) :    7.69s    (  9.42s)       0.86s    (  0.84s)
  Task 12 (worker) :    7.42s    (  9.40s)       0.92s    (  0.86s)
  Task 13 (worker) :    7.28s    (  9.39s)       0.91s    (  0.86s)
  Task 14 (worker) :    7.44s    (  9.38s)       0.91s    (  0.87s)
  Task 15 (worker) :    7.25s    (  9.33s)       1.11s    (  0.93s)
  Task 16 (worker) :    7.94s    (  9.33s)       0.97s    (  0.93s)
  Task 17 (worker) :    7.59s    (  9.37s)       1.06s    (  0.88s)

  SPARKS: 597 (446 converted, 0 dud, 1 GC'd, 150 fizzled)

  Productivity  96.1% of total user, 1245.3% of total elapsed

We created 597 sparks, of which 446 were converted into threads.

If you explicitly want to do manual thread creation and communication, this can be done via forkIO and MVars.

Upvotes: 13

Related Questions