Reputation: 35099
I have a finite list of values:
values :: [A]
... and an expensive, but pure, function on those values:
expensiveFunction :: A -> Maybe B
How do I run that function on each value in parallel and only return the first n
results that complete with a Just
and stop computation of the unfinished results?
takeJustsPar :: (NFData b) => Int -> (a -> Maybe b) -> [a] -> [b]
takeJustsPar maxJusts f as = ???
I know how I would do this using Control.Concurrent
, but I wanted to experiment using Haskell's parallelism features. Also, the (scant) literature I could find seems to indicate that Haskell's parallelism features make it cheaper to spawn parallel computations and adapt the workload among the number of capabilities.
Upvotes: 4
Views: 186
Reputation: 35099
I attempted two solutions. The first uses the Par
monad (i.e. Control.Monad.Par
):
import Control.Monad.Par (Par, NFData)
import Control.Monad.Par.Combinator (parMap)
import Data.Maybe (catMaybes)
import Data.List.Split (chunksOf)
takeJustsPar :: (NFData b) => Int -> Int -> (a -> Maybe b) -> [a] -> Par [b]
takeJustsPar n chunkSize f as = go n (chunksOf chunkSize as) where
go _ [] = return []
go 0 _ = return []
go numNeeded (chunk:chunks) = do
evaluatedChunk <- parMap f chunk
let results = catMaybes evaluatedChunk
numFound = length results
numRemaining = numNeeded - numFound
fmap (results ++) $ go numRemaining chunks
The second attempt used Control.Parallel.Strategies
:
import Control.Parallel.Strategies
import Data.List.Split (chunksOf)
chunkPar :: (NFData a) => Int -> Int -> [a] -> [a]
chunkPar innerSize outerSize as
= concat ((chunksOf innerSize as) `using` (parBuffer outerSize rdeepseq))
The latter one ended up being MUCH more composable, since I could just write:
take n $ catMaybes $ chunkPar 1000 10 $ map expensiveFunction xs
... rather than baking in the take
and catMaybes
behavior into the parallelism strategy.
The latter solution also gives nearly perfect utilization. On the embarrassingly parallel problem I tested it on, it gave 99% utilization for 8 cores. I didn't test the utilization of the Par
monad because I was borrowing a colleague's computer and didn't want to waste their time when I was satisfied with the performance of Control.Parallel.Strategies
.
So the answer is to use Control.Parallel.Strategies
, which gives much more composable behavior and great multi-core utilization.
Upvotes: 4