Reputation: 303
I have the following set of functions to count the number of primes less than or equal to a number n in Haskell.
The algorithm takes a number, checks if it is divisible by two and then checks if it divisible by odd numbers up to the square root of the number being checked.
-- is a numner, n, prime?
isPrime :: Int -> Bool
isPrime n = n > 1 &&
foldr (\d r -> d * d > n || (n `rem` d /= 0 && r))
True divisors
-- list of divisors for which to test primality
divisors :: [Int]
divisors = 2:[3,5..]
-- pi(n) - the prime counting function, the number of prime numbers <= n
primesNo :: Int -> Int
primesNo 2 = 1
primesNo n
| isPrime n = 1 + primesNo (n-1)
| otherwise = 0 + primesNo (n-1)
main = print $ primesNo (2^22)
Using GHC with the -O2 optimisation flag, counting the number of primes for n = 2^22 takes ~3.8sec on my system. The following C code take ~ 0.8 sec:
#include <stdio.h>
#include <math.h>
/*
compile with: gcc -std=c11 -lm -O2 c_primes.c -o c_orig
*/
int isPrime(int n) {
if (n < 2)
return 0;
else if (n == 2)
return 1;
else if (n % 2 == 0)
return 0;
int uL = sqrt(n);
int i = 3;
while (i <= uL) {
if (n % i == 0)
return 0;
i+=2;
}
return 1;
}
int main() {
int noPrimes = 0, limit = 4194304;
for (int n = 0; n <= limit; n++) {
if (isPrime(n))
noPrimes++;
}
printf("Number of primes in the interval [0,%d]: %d\n", limit, noPrimes);
return 0;
}
This algorithm take about 0.9 sec in Java and 1.8 sec in JavaScript (on Node) so it just feels that the Haskell version is slower than I would expect be. Is there anyway I can more efficiently code this in Haskell without changing the algorithm?
The following version of isPrime offered by @dfeuer shaves one second off the running time taking it down to 2.8sec (down from 3.8). Though this is still slower than JavaScript (Node) which takes approx 1.8 sec as shown here, Yet Another Language Speed Test.
isPrime :: Int -> Bool
isPrime n
| n <= 2 = n == 2
| otherwise = odd n && go 3
where
go factor
| factor * factor > n = True
| otherwise = n `rem` factor /= 0 && go (factor+2)
In the above isPrime function, the function go calls factor * factor for each divisor for a single n. I would imagine that it would be more efficient to compare factor to the square root of n as this would only have to be calculated once per n. However, using the following code, computation time is increased by approximately 10%, is the square root of n being re-calculated every time the inequality is evaluated (for each factor)?
isPrime :: Int -> Bool
isPrime n
| n <= 2 = n == 2
| otherwise = odd n && go 3
where
go factor
| factor > upperLim = True
| otherwise = n `rem` factor /= 0 && go (factor+2)
where
upperLim = (floor.sqrt.fromIntegral) n
Upvotes: 5
Views: 994
Reputation: 52029
In general I've found that looping in Haskell is about 3-4 times slower than what can be accomplished with C.
To help understand the performance difference I slightly modified the programs so that a fixed number of divisor tests are made per iteration and added a parameter e to control how many iterations are made - the number of (outer) iterations performed is 2^e. For each outer iteration approx. 2^21 divisor tests are made.
The source code for each program and scripts to run and analyze the results made be found here: https://github.com/erantapaa/loopbench
Pull-requests to improve the benchmarking are welcome.
Here are the results I get on a 2.4 GHz Intel Core 2 Duo using ghc 7.8.3 (under OSX). The gcc used was "Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)".
e ctime htime allocated gc-bytes alloc/iter h/c dns
10 0.0101 0.0200 87424 3408 1.980 4.61
11 0.0151 0.0345 112000 3408 2.285 4.51
12 0.0263 0.0700 161152 3408 2.661 5.09
13 0.0472 0.1345 259456 3408 2.850 5.08
14 0.0819 0.2709 456200 3408 3.308 5.50
15 0.1575 0.5382 849416 9616 3.417 5.54
16 0.3112 1.0900 1635848 15960 3.503 5.66
17 0.6105 2.1682 3208848 15984 3.552 5.66
18 1.2167 4.3536 6354576 16032 24.24 3.578 5.70
19 2.4092 8.7336 12646032 16128 24.12 3.625 5.75
20 4.8332 17.4109 25229080 16320 24.06 3.602 5.72
e = exponent parameter
ctime = running time of the C program
htime = running time of the Haskell program
allocated = bytes allocated in the heap (Haskell program)
gc-bytes = bytes copied during GC (Haskell program)
alloc/iter = bytes allocated in the heap / 2^e
h / c = htime divided by ctime
dns = (htime - ctime) divided by the number of divisor tests made
in nanoseconds
# divisor tests made = 2^e * 2^11
Some observations:
It is well known that GHC does not produce the same tight loop code that a C compiler produces. The penalty you pay is approx. 4.6 ns per iteration. Moreover, it looks like Haskell is also affected by cache effects due to heap allocation.
24 bytes per allocation and 5 ns per loop iteration is not a lot for most programs, but when you have 2^20 allocations and 2^40 loop iterations it becomes a factor.
Upvotes: 3
Reputation: 30103
The C code uses 32-bit integers, while the Haskell code uses 64-bit integers.
The original C code runs in 0.63 secs on my computer. However, if I replace the int
-s with long
-s, it runs in 2.07 seconds with gcc and 2.17 secs with clang.
In comparison, the updated isPrime
function (see it in the thread question) runs in 2.09 seconds (with -O2 and -fllvm). Note that is slightly better than the clang-compiled C code, even though they use the same LLVM code generator.
The original Haskell code runs in 3.2 secs, which I think is an acceptable overhead for the convenience of using lists for iteration.
Upvotes: 2
Reputation: 71070
Inline everything, loose the superfluous tests, add strictness annotations just to be sure:
{-# LANGUAGE BangPatterns #-}
-- pi(n) - the prime counting function, the number of prime numbers <= n
primesNo :: Int -> Int
primesNo n
| n < 2 = 0
| otherwise = g 3 1
where
g k !cnt | k > n = cnt
| go 3 = g (k+2) (cnt+1)
| otherwise = g (k+2) cnt
where go f
| f*f > k = True
| otherwise = k `rem` f /= 0 && go (f+2)
main = print $ primesNo (2^22)
The go
testing function is as in dfeuer's answer. Compile with -O2 as usual, and always test by running a standalone executable (with something like > test +RTS -s
).
Calls to g
can be made direct (that's really micro-optimizing it):
primesNo n
| n < 2 = 0
| otherwise = g 3 1
where
g k !cnt | k > n = cnt
| otherwise = go 3
where go f
| f*f > k = g (k+2) (cnt+1)
| k `rem` f == 0 = g (k+2) cnt
| otherwise = go (f+2)
More substantial change (still keeping the algorithm arguably the same) which might or mightn't speed it up is to turn it inside out, to spare the squares computations: test by [3]
all odds from 9 to 23, by [3,5]
all odds from 25 to 47, etc., along the lines of this segmented code:
import Data.List (inits)
primesNo n = length (takeWhile (<= n) $ 2 : oddprimes)
where
oddprimes = sieve 3 9 [3,5..] (inits [3,5..])
sieve x q ~(_:t) (fs:ft) =
filter ((`all` fs) . ((/=0).) . rem) [x,x+2..q-2]
++ sieve (q+2) (head t^2) t ft
Sometimes tweaking your code into using and
instead of all
changes the speed too. Further speedup might be attempted by inlining and simplifying everything (replace length
with counting etc.).
Upvotes: 1
Reputation: 48580
I urge you to use a different algorithm, such as the Sieve of Eratosthenes discussed in the paper by Melissa O'Neill, or the version used in Math.NumberTheory.Primes
from the arithmoi package, which also offers an optimized prime counting function. However, this might get you better constant factors:
-- is a number, n, prime?
isPrime :: Int -> Bool
isPrime n
| n <= 2 = n == 2
| otherwise = odd n && -- Put the 2 here instead
foldr (\d r -> d * d > n || (n `rem` d /= 0 && r))
True divisors
-- list of divisors for which to test primality
divisors :: [Int]
{-# INLINE divisors #-} -- No guarantee, but it might possibly inline and stay inlined,
-- so the numbers will be generated on each call instead of
-- being pulled in (expensively) from RAM.
divisors = [3,5..] -- No more 2:
The reason to get rid of the 2:
is that an optimization called "foldr/build fusion", "short cut deforestation", or just "list fusion" can, potentially, make your divisors list go away, but, at least with GHC < 7.10.1, that 2:
will block the optimization.
Edit: it seems that's not working for you, so here's something else to try:
isPrime n
| n <= 2 = n == 2
| otherwise = odd n && go 3
where
go factor
| factor * factor > n = True
| otherwise = n `rem` factor /= 0 && go (factor+2)
Upvotes: 4