Valerie94
Valerie94

Reputation: 303

Worse forConcurrently performance than sequential

I've written a function that up-samples a file from 48kHz to 192kHz by means of a filter:

upsample :: Coefficients -> FilePath -> IO ()

It takes the filter coefficients, the path of the file (that has to be upsamples) and writes the result to a new file.

I have to up-sample many files so I've written a function to up-sample a full directory in parallel, using forConcurrently_ from Control.Concurrent.Async:

upsampleDirectory :: Directory -> FilePath -> IO ()
upsampleDirectory dir coefPath = do
  files <- getAllFilesFromDirectory dir
  coefs <- loadCoefficients coefPath
  forConcurrently_ files $ upsample coefs

I'm compiling with the -threaded option and running using +RTS -N2. What I see is that up-sampling 2 files sequentially is faster than up-sampling both files in parallel.

Upsampling file1.wav takes 18.863s. Upsampling file2.wav takes 18.707s. Upsampling a directory with file1.wav and file2.wav takes 66.250s.

What am I doing wrong?

I've tried to keep this post concise, so ask me if you need more details on some of the functions.

Upvotes: 2

Views: 141

Answers (1)

K. A. Buhr
K. A. Buhr

Reputation: 50829

Here are a couple of possibilities. First, make yourself 100% sure you're actually running your program with +RTS -N2 -RTS. I can't tell you how many times I've been benchmarking a parallel program and written:

stack exec myprogram +RTS -N2 -RTS

in place of:

stack exec myprogram -- +RTS -N2 -RTS

and gotten myself hopelessly confused. (The first version runs the stack executable on two processors but the target executable on one!) Maybe add a print $ getNumCapabilities at the beginning of your main program to be sure.

After confirming you're running on two processors, then the next most likely issue is that your implementation is not running in constant space and is blowing up the heap. Here's a simple test program I used to try to duplicate your problem. (Feel free to use my awesome upsampling filter yourself!)

module Main where

import Control.Concurrent.Async
import System.Environment
import qualified Data.ByteString as B

upsample :: FilePath -> IO ()
upsample fp = do c <- B.readFile fp
                 let c' = B.pack $ concatMap (replicate 4) $ B.unpack c
                 B.writeFile (fp ++ ".out") c'

upsampleFiles :: [FilePath] -> IO ()
upsampleFiles files = do
  forConcurrently_ files $ upsample

main :: IO ()
main = upsampleFiles =<< getArgs   -- sample all file on command line

When I ran this on a single 70meg test file, it ran in 14 secs. When I ran it on two copies in parallel, it ran for more than a minute before it started swapping like mad, and I had to kill it. After switching to:

import qualified Data.ByteString.Lazy as B

it ran in 3.7 secs on a single file, 7.8 secs on two copies on a single processor, and 4.0 secs on two copies on two processors with +RTS -N2.

Make sure you're compiling with optimzations on, profile your program, and make sure it's running in a constant (or at least reasonable) heap space. The above program runs in a constant 100k bytes of heap. A similar version that uses a strict ByteString for reading and a lazy ByteString for writing reads the whole file into memory, but the heap almost immediately grows to 70megs (the size of the file) within a fraction of a second and then stays constant while the file is processed.

No matter how complicated your filter is, if your program is growing gigabytes of heap, the implementation is broken, and you'll need to fix it before you worry about performance, parallel or otherwise.

Upvotes: 2

Related Questions