Marcus Riemer
Marcus Riemer

Reputation: 7728

Parallel maximum computation

I have a set of problems that I would like to evaluate in parallel. These problems are expressed using a simple expression type very similar to this:

-- Expressions are either a constant value or two expressions
-- combined using a certain operation
data Expr 
    = Const NumType
    | Binary BinOp Expr Expr

-- The possible operations
data BinOp = Add | Sub | Mul | Div
    deriving (Eq)

These expressions are built on the fly and should evaluate to a certain result which may be valid or invalid. This is expressed as a monad to stop computation when encountering invalid results.

data Result a
    = Val { val :: a }
    | Exc { exc :: String }

instance Monad Result where
    return = Val
    (Exc e) >>= _ = (Exc e)
    (Val v) >>= g = g v

To determine a value of each solved problem I have two relevant functions:

eval :: Expr -> Result NumType
score :: Expr -> NumType

And finally I have solve functions that will return a [Expr]. This leads to my main function currently looking like this:

main :: IO ()
main = do
    strAvailableNumbers <- getLine
    strTargetNumber <- getLine
    let numbers = parseList strAvailableNumbers 
        target = parseTargetNumber strTargetNumber in
            sequence $ map (print) $ 
                solveHeuristic1 (Problem target numbers) [Add] [Sub] ++
                solveHeuristic2 (Problem target numbers) 

    return ()

The basic idea is that I read a list of numbers and a target number from stdin and then print expressions on stdout.

But I have two problems that I would like to solve and I am not quite sure how related they are:

Update: Created a gist with complete source code.

Upvotes: 2

Views: 266

Answers (1)

daniel gratzer
daniel gratzer

Reputation: 53901

The problem here is that evaluating an IO action with seq does approximately nothing. So you're just running things sequentially with slightly more overhead.

You can refractor things to make them pure again

main :: IO ()
main = do
    mapM_ (`seq` print "found it") -- make sure we're not 
                                   -- benchmarking printing stuff
          . concat
          . parMap rdeepseq (solve [1..10000000])
          $ [42, 42]

    return ()

And add instances of NFData to use rdeepseq which will fully evaluate things

instance NFData BinOp -- Binop is just an enum, WHNF = NF

instance NFData Expr where
  rnf (Const a) = a `deepseq` ()
  rnf (Binary b e1 e2) = b `deepseq` e1 `deepseq` e2 `deepseq` ()

And now if we run it we get... a stackoverflow. I bumped up the size sufficiently that we search in order to actually make it take long enough to be worth benchmarking and now fully loading both structures into memory will blow the stack. Bumping up the stack size to the point where we don't blow everything up leaves us running 40% faster (3 vs 5 seconds) using -N2 than without. Which I would consider the expected result. Visually when running this, I can see 2 cores briefly jump up to 100%.

Final compilation sequence

> ghc -O2 -threaded -rtsops bench.hs
> ./bench +RTS -K10000000 -N2

Upvotes: 3

Related Questions