Reputation: 53047

Why is this Haskell array filling operation so slow?

For a specific task, I need a lot of fast, individual writes in a mutable array. In order to check the performance, I've used the following test:

size :: Int
size = 256*256*16

arr :: UArray Int Int
arr = runST $ do
    arr <- newArray (0,size) 0 :: ST s (STUArray s Int Int)
    forM_ [0..size] $ \i -> do
        writeArray arr i i 
    unsafeFreeze arr

arr_sum = foldl' (\ sum i -> sum + (arr ! i)) 0 [0..size-1]

main = print arr_sum

Here is the result:

vh:haskell apple1$ ghc -O3 bench.hs -o bench; time ./bench
Linking bench ...
549755289600

real    0m0.748s
user    0m0.697s
sys 0m0.048s

I suspected it shouldn't take 0.7s to fill a 256*256*16 array on memory, so I tested an equivalent program in JavaScript:

size = 256*256*16;
x = new Array(size);
s = 0;
for (var i=0; i<size; ++i)
    x[i] = i;
for (var i=0; i<size; ++i)
    s += x[i];
console.log(s);

And the result is:

vh:haskell apple1$ time node bench.js
549755289600

real    0m0.175s
user    0m0.150s
sys 0m0.024s

On C, the time was 0.012s, which is a good lower bound.

#include <stdio.h>

#define SIZE (256*256*16)
double x[SIZE];

int main(){
    int i;
    double s = 0;
    for (i = 0; i<SIZE; ++i)
        x[i] = i;
    for (i = 0; i<SIZE; ++i)
        s += x[i];
    printf("%f",s);
};

So that pretty much confirms my hypothesis that my Haskell program is doing something else other than just filling the array and summing it afterwards. There is probably a hidden stack somewhere, but I can not identify it since I used foldl' and forM_, which I believed were compiled to stack-free code. So, what is the source of inefficiency here?

Upvotes: 4

Answers (3)

MaiaVictor

Reputation: 53047

Sorry, I guess only I can answer this question properly. The reason, for anyone curious, has nothing to do with the code, but the fact GHC was not recompiling my auto-built binaries with -O2 when I benchmarked them. The solution was to use the force-rrecomp flag:

ghc -fforce-recomp -O2 bench.hs -o bench

A better solution, suggested by people on #haskell @ freenode, is to set up Cabal properly, and build using it.

Upvotes: 3

jberryman

Reputation: 16645

This is too big for a comment, but not really an answer. Your imports were a little annoying to track down, and I also squashed the warnings from -Wall (important to pay attention to when you're looking at performance):

module Main where

import Data.Array.Unboxed
import Data.Array.ST
import Data.Array.Unsafe
import Control.Monad.ST
import Control.Monad
import Data.List

size :: Int
size = 256*256*16

ar :: UArray Int Int
ar = runST $ do
    a <- newArray (0,size) 0 :: ST s (STUArray s Int Int)
    forM_ [0..size] $ \i -> do
        writeArray a i i 
    unsafeFreeze a

arrSum :: Int
arrSum = foldl' (\ s i -> s + (ar ! i)) 0 [0..size-1]

main :: IO ()
main = print arrSum

for haskell and node repsectively:

jberryman /tmp » time ./t         
-524288
./t  0.04s user 0.01s system 92% cpu 0.056 total
jberryman /tmp » time nodejs t.js 
549755289600
nodejs t.js  0.19s user 0.01s system 100% cpu 0.200 total

I get basically the same timing for GHC 7.8, and 7.6 (where I have to import Data.Array.ST hiding (unsafeFreeze), but otherwise code is the same).

EDIT: Oops, look at me not being very observant; notice that on my 32-bit machine the count overflows in haskell, but not in JS, so we have another apples to oranges; a fairer comparison might be Integer here.

I definitely recommend criterion for doing any kind of micro benchmarking, otherwise you're setting yourself up to waste a lot of time.

Also I don't believe you have the overhead of initializing your array in the C version, so it's not quite a fair comparison.

Upvotes: 2

ErikR

Reputation: 52057

GHC does not produce nice tight loops like what you accomplish with C. A factor of 3 in run times is about par for the course based on my experience.

To get better performance use the Vector library:

import qualified Data.Vector.Unboxed as V

size = 256*256*16 :: Int

doit = V.foldl' (+) 0 vec
  where vec = V.generate size id 

main = print doit

Upvotes: 5

Why is this Haskell array filling operation so slow?

Answers (3)

Related Questions