Reputation: 4506
The following code works with small lists, however it takes forever with long lists, I suppose it's my double use of length that is the problem.
ratioOfPrimes :: [Int] -> Double
ratioOfPrimes xs = fromIntegral (length (filter isPrime xs))/ fromIntegral(length xs)
How do I calculate the ratio of an element in longer lists?
Upvotes: 1
Views: 278
Reputation: 43320
The double use of length
isn't the main problem here. The multiple traversals in your implementation produce a constant factor and with double length
and filter
you get the avg complexity of O(3n)
. Due to Stream Fusion it's even O(2n)
, as already mentioned by Impredicative. But in fact since the constant factors don't have a dramatic effect on performance, it's even conventional to simply ignore them, so, conventionally speaking, your implementation still has the complexity of O(n)
, where n
is the length of the input list.
The real problem here is that the above would all be true only if isPrime
had the complexity of O(1)
, but it doesn't. This function performs a traversal thru a list of all primes, so it itself has the complexity of O(m)
. So the dramatic performance decrease here is caused by your algorithm having the final complexity of O(n*m)
, because on each iteration of the input list it has to traverse the list of all primes to an unknown depth.
To optimize I suggest to first sort the input list (takes O(n*log n)
) and itegrate a custom lookup on a list of all primes, which will drop the already visited numbers on each iteration. This way you'll be able to achieve a single traversal on the list of all primes, which theoretically could grant you with the complexity of O(n*log n + n + m)
, which again, conventionally can be thought of as simply O(n*log n)
, by highlighting the cost center.
Upvotes: 4
Reputation: 5069
So, there's a few things going on there. Let's look at some of the operations involved:
length
filter
isPrime
length
As you say, using length
twice isn't going to help, since that's O(n)
for lists. You do that twice. Then there's filter
, which is also going to do a whole pass of the list in O(n)
. What we'd like to do is do all this in a single pass of the list.
Functions in the Data.List.Stream
module implement a technique called Stream Fusion, which would for example rewrite your (length (filter isPrime xs))
call into a single loop. However, you'd still have the second call to length. You could rewrite this whole thing into a single fold (or use of the State or ST monads) with a pair of accumulators and do this in a single pass:
ratioOfPrimes xs = let
(a,b) = foldl' (\(odd,all) i -> if (isPrime i) then (odd +1, all+1) else (odd, all+1)) (0,0) xs
in a/b
However, in this case you could also move away from using a list and use the vector library. The vector library implements the same stream fusion techniques for removing intermediate lists, but also has some other nifty features:
length
is O(1)
Data.Vector.Unboxed
module lets you store unboxable types (which primitive types such as Int
certainly are) without the overhead of the boxed representation. So this list of ints would be stored as a low-level Int
array.Using the vector
package should let you write the idiomatic representation you have above and get better than the performance of a single-pass translation.
import qualified Data.Vector.Unboxed as U
ratioOfPrimes :: U.Vector Int -> Double
ratioOfPrimes xs = (fromIntegral $ U.length . U.filter isPrime $ xs) / (fromIntegral $ U.length xs)
Of course, the thing that hasn't been mentioned is the isPrime
function, and whether the real problem is that it's slow for large n
. An unperformant prime checker could easily blow concerns over list indexing out of the water.
Upvotes: 3