Parallel maximum computation

Question

I have a set of problems that I would like to evaluate in parallel. These problems are expressed using a simple expression type very similar to this:

-- Expressions are either a constant value or two expressions
-- combined using a certain operation
data Expr 
    = Const NumType
    | Binary BinOp Expr Expr

-- The possible operations
data BinOp = Add | Sub | Mul | Div
    deriving (Eq)

These expressions are built on the fly and should evaluate to a certain result which may be valid or invalid. This is expressed as a monad to stop computation when encountering invalid results.

data Result a
    = Val { val :: a }
    | Exc { exc :: String }

instance Monad Result where
    return = Val
    (Exc e) >>= _ = (Exc e)
    (Val v) >>= g = g v

To determine a value of each solved problem I have two relevant functions:

eval :: Expr -> Result NumType
score :: Expr -> NumType

And finally I have solve functions that will return a [Expr]. This leads to my main function currently looking like this:

main :: IO ()
main = do
    strAvailableNumbers <- getLine
    strTargetNumber <- getLine
    let numbers = parseList strAvailableNumbers 
        target = parseTargetNumber strTargetNumber in
            sequence $ map (print) $ 
                solveHeuristic1 (Problem target numbers) [Add] [Sub] ++
                solveHeuristic2 (Problem target numbers) 

    return ()

The basic idea is that I read a list of numbers and a target number from stdin and then print expressions on stdout.

But I have two problems that I would like to solve and I am not quite sure how related they are:

Those heuristics run entirely unaware of each other and therefore don't know whether the score of their solution is higher than any other. I would like to introduce some kind of state to the map function to only print the the new Expr if its score is higher then the Expr printed previously.
I would like to do these computations in parallel and attempted to do so by using (parMap rseq) instead of map, compiling with the -threaded option and running it using +RTS -N2. The result is a runtime increase from 5 seconds to 7 seconds. Not what I expected, altough time shows the CPU utilization is higher. I guess I am not correctly using parMap or do something wrong by using ++. So how would I run a list of independent functions, each returning a list of elements, in parallel?

Update: Created a gist with complete source code.

daniel gratzer · Accepted Answer

The problem here is that evaluating an IO action with seq does approximately nothing. So you're just running things sequentially with slightly more overhead.

You can refractor things to make them pure again

main :: IO ()
main = do
    mapM_ (`seq` print "found it") -- make sure we're not 
                                   -- benchmarking printing stuff
          . concat
          . parMap rdeepseq (solve [1..10000000])
          $ [42, 42]

    return ()

And add instances of NFData to use rdeepseq which will fully evaluate things

instance NFData BinOp -- Binop is just an enum, WHNF = NF

instance NFData Expr where
  rnf (Const a) = a `deepseq` ()
  rnf (Binary b e1 e2) = b `deepseq` e1 `deepseq` e2 `deepseq` ()

And now if we run it we get... a stackoverflow. I bumped up the size sufficiently that we search in order to actually make it take long enough to be worth benchmarking and now fully loading both structures into memory will blow the stack. Bumping up the stack size to the point where we don't blow everything up leaves us running 40% faster (3 vs 5 seconds) using -N2 than without. Which I would consider the expected result. Visually when running this, I can see 2 cores briefly jump up to 100%.

Final compilation sequence

> ghc -O2 -threaded -rtsops bench.hs
> ./bench +RTS -K10000000 -N2

Parallel maximum computation

Answers (1)

Related Questions