hashes
hashes

Reputation: 49

Why does my prime number sieve return the same result slower than the brute force method for finding primes in Python 2.7?

I am fairly new to Python and I have been trying to find a fast way to find primes till a given number.

When I use the Prime of Eratosthenes sieve using the following code:

#Finding primes till 40000.
import time
start = time.time()
def prime_eratosthenes(n):
    list = []
    prime_list = []
    for i in range(2, n+1):
        if i not in list:
            prime_list.append(i)
            for j in range(i*i, n+1, i):
                list.append(j)
    return prime_list

lists = prime_eratosthenes(40000)
print lists
end = time.time()
runtime = end - start
print "runtime =",runtime

Along with the list containing the primes, I get a line like the one below as output:

runtime = 20.4290001392

Depending upon the RAM being used etc, I usually consistently get a value within an range of +-0.5.

However when I try to find the primes till 40000 using a brute force method as in the following code:

import time
start = time.time()
prime_lists = []
for i in range(1,40000+1):
    for j in range(2,i):
        if i%j==0:
            break
    else:
        prime_lists.append(i)
print prime_lists
end = time.time()
runtime = end - start
print "runtime =",runtime

This time, along with the the list of primes, I get a smaller value for runtime:

runtime = 16.0729999542

The value only varies within a range of +-0.5.

Clearly, the sieve is slower than the brute force method.

I also observed that the difference between the runtimes in the two cases only increases with an increase in the value 'n' till which primes are to be found.

Can anyone give a logical explanation for the above mentioned behavior? I expected the sieve to function more efficiently than the brute force method but it seems to work vice-versa here.

Upvotes: 1

Views: 143

Answers (1)

qwr
qwr

Reputation: 10929

While appending to a list is not the best way to implement this algorithm (the original algorithm uses fixed size arrays), it is amortized constant time. I think a bigger issue is if i not in list which is linear time. The best change you can make for larger inputs is having the outer for loop only check up to sqrt(n), which saves a lot of computation.

A better approach is to keep a boolean array which keeps track of striking off numbers, like what is seen in the Wikipedia article for the Sieve. This way, skipping numbers is constant time since it's an array access.

For example:

def sieve(n):
    nums = [0] * n
    for i in range(2, int(n**0.5)+1):
        if nums[i] == 0:
            for j in range(i*i, n, i):
                nums[j] = 1

    return [i for i in range(2, n) if nums[i] == 0]

So to answer your question, your two for loops make the algorithm do potentially O(n^2) work, while being smart about the outer for loop makes the new algorithm take up to O(n sqrt(n)) time (in practice, for reasonably-sized n, the runtime is closer to O(n))

Upvotes: 1

Related Questions