Reputation: 545
I am trying to solve a 2-sum algorithm problem for Standford university online course on coursera. I need to find all distinct pairs x+y
in a list that sum to a value t
in a range [-10000 .. 10000]
. I know there more efficient implementations but I thought it would be a good time to try and do some Haskell parallel programming.
I have tried to implement parellelisation just by looping through half of the range in two different threads (which I think are called sparks). My code is the following:
module Main where
import Data.List
import qualified Data.Map as M
import Debug.Trace
import Control.Parallel (par,pseq)
main :: IO ()
main = interact run
range :: [Int]
range = [negate 10000..10000]
emptyMap :: M.Map Int Bool
emptyMap = M.fromList $ zip [] []
run :: String -> String
run xs = let parsedInput = map (read :: String -> Int) $ words xs
hashMap = M.fromList $ zip parsedInput (repeat True)
pcalc r = map (\t -> trace (show t) (countVals hashMap parsedInput t)) r
bot = pcalc (take (div (length range) 2) range)
top = pcalc (drop (div (length range) 2) range)
out = top `par` bot `pseq` (sum bot + sum top)
in show out
countVals :: M.Map Int Bool -> [Int] -> Int -> Int
countVals m ks t = foldl' go 0 ks
where go acum x = if M.lookup y m == Just True
&& y /= x
then 1
else acum
where y = t - x
You can see I have two variables top
and bot
which I am trying to calculate in parallel via
out = top `par` bot `pseq` (sum bot + sum top)
which is what I thought other stack overflow answers are recommending. However when I compile and run I only seem to see the trace from the bot
variable.
% stack ghc --package parallel -- -threaded Main.hs
[1 of 1] Compiling Main ( Main.hs, Main.o )
Linking Main ...
% ./Main +RTS -N8 < input.txt
-10000
-9999
-9998
-9997
-9996
...
Whereas I was expecting something like:
% ./Main +RTS -N8 < input.txt
-10000
0
-9999
1
-9998
2
-9997
-9996
...
Can someone help point out what exactly I am doing wrong? Thanks
Upvotes: 3
Views: 163
Reputation: 116139
Let's focus on this part:
bot = pcalc (take (div (length range) 2) range)
top = pcalc (drop (div (length range) 2) range)
out = top `par` bot `pseq` (sum bot + sum top)
Here, bot
and top
are lists. When we seq
, pseq
or par
a value we cause it to be evaluated; since Haskell is lazy, evaluation stops when the "weak head normal form" is reached, i.e. until the first constructor appears in the result. For list values, this means that they are reduced to either []
or unevaluatedHead : unevaluatedTail
.
Because of this, top `par` bot `pseq` ...
only parallelizes the evaluation of the first cell of the lists, and not their full contents. The whole lists will only get evaluated after pseq
when we sum them, but that is run on only one core.
To force the code to be parallel, we can parallelize the sums instead:
sumBot = sum bot
sumTop = sum top
out = sumBot `par` sumTop `pseq` sumBot + sumTop
Since evaluating the sums to WHNF requires evaluating the whole list, this should properly parallelize the computation.
Upvotes: 3