Gabriella Gonzalez
Gabriella Gonzalez

Reputation: 35099

Throttling parallel computations

The Question

I have a finite list of values:

values :: [A]

... and an expensive, but pure, function on those values:

expensiveFunction :: A -> Maybe B

How do I run that function on each value in parallel and only return the first n results that complete with a Just and stop computation of the unfinished results?

takeJustsPar :: (NFData b) => Int -> (a -> Maybe b) -> [a] -> [b]
takeJustsPar maxJusts f as = ???

The Motivation

I know how I would do this using Control.Concurrent, but I wanted to experiment using Haskell's parallelism features. Also, the (scant) literature I could find seems to indicate that Haskell's parallelism features make it cheaper to spawn parallel computations and adapt the workload among the number of capabilities.

Upvotes: 4

Views: 186

Answers (1)

Gabriella Gonzalez
Gabriella Gonzalez

Reputation: 35099

I attempted two solutions. The first uses the Par monad (i.e. Control.Monad.Par):

import Control.Monad.Par (Par, NFData)
import Control.Monad.Par.Combinator (parMap)
import Data.Maybe (catMaybes)
import Data.List.Split (chunksOf)

takeJustsPar :: (NFData b) => Int -> Int -> (a -> Maybe b) -> [a] -> Par [b]
takeJustsPar n chunkSize f as = go n (chunksOf chunkSize as) where
    go _ [] = return []
    go 0 _  = return []
    go numNeeded (chunk:chunks) = do
        evaluatedChunk <- parMap f chunk
        let results      = catMaybes evaluatedChunk
            numFound     = length results
            numRemaining = numNeeded - numFound
        fmap (results ++) $ go numRemaining chunks

The second attempt used Control.Parallel.Strategies:

import Control.Parallel.Strategies
import Data.List.Split (chunksOf)

chunkPar :: (NFData a) => Int -> Int -> [a] -> [a]
chunkPar innerSize outerSize as
  = concat ((chunksOf innerSize as) `using` (parBuffer outerSize rdeepseq))

The latter one ended up being MUCH more composable, since I could just write:

take n $ catMaybes $ chunkPar 1000 10 $ map expensiveFunction xs

... rather than baking in the take and catMaybes behavior into the parallelism strategy.

The latter solution also gives nearly perfect utilization. On the embarrassingly parallel problem I tested it on, it gave 99% utilization for 8 cores. I didn't test the utilization of the Par monad because I was borrowing a colleague's computer and didn't want to waste their time when I was satisfied with the performance of Control.Parallel.Strategies.

So the answer is to use Control.Parallel.Strategies, which gives much more composable behavior and great multi-core utilization.

Upvotes: 4

Related Questions