user3692497
user3692497

Reputation:

Haskell Vector performance woes

I wrote the classic prime number sieve algorithm in C which pretty much instantly generates all the primes smaller than 1,000,000 on my machine (I didn't do anything to optimize it).

In Haskell I use a Vector of Ints, and generate an update list for each prime number which sets the multiples to zero. This is basically the same thing as I do in C with a nested for loop:

import qualified Data.Vector.Unboxed as U

--| generate primes smaller than p
sievePrimes :: Int -> U.Vector Int
sievePrimes p = foldl f ns [0..p-1]
    where
        ns = U.fromList $ 0:[2..p]
        f v i
            | v U.! i == 0 = v
            | otherwise = v U.// [((i + 1)*k - 1, 0) | k <- [2..p `div` (i + 1)]]

The Haskell version runs about 1000 times slower than the C version. All I've done to optimize is to use the vector package. The algorithm is the same.

Why is it running so much slower? Isn't Vector efficient enough?

I compiled both C and Haskell with -O2.

Upvotes: 1

Views: 438

Answers (1)

Will Ness
Will Ness

Reputation: 71070

You need to replace your foldl call with foldl f ns [0 .. floor . sqrt . fromIntegral $ p]. This radically improves the time complexity, as explained in the haskell-wiki, because the cost of U.// is O(m+n) instead of O(n) as it is in languages with destructive update, like C; which is the reason why not stopping as soon as possible doesn't worsen the time complexity there.

But in Haskell it is crucial to stop as soon as possible with immutable data, where the time complexity of the update operation is worse, as it is for lists and Vectors too.

When analyzing the performance of a piece of code, always analyze its empirical orders of growth by measuring its speed at several problem size points and calculating logBase (n2/n1) (t2/t1).

The updated code, interpreted in GHCi on my computer, generates the n = 78,498 primes below N = 1,000,000 in 3.53s, running at ~N1.1 or ~n1.2, which is very decent.

Your code, measured at N = 40,000/20,000, runs at ~N1.7 or ~n1.9, which is terribly slow; the projected run time to 1,000,000 is 7 minutes, if the complexity stays the same.

Upvotes: 1

Related Questions