Reputation: 1429
I have written a simple implementation of the Sieve of Eratosthenes, and I would like to know if there is a more efficient way to perform one of the steps.
def eratosthenes(n):
primes = [2]
is_prime = [False] + ((n - 1)/2)*[True]
for i in xrange(len(is_prime)):
if is_prime[i]:
p = 2*i + 1
primes.append(p)
is_prime[i*p + i::p] = [False]*len(is_prime[i*p + i::p])
return primes
I am using Python's list slicing to update my list of booleans is_prime
. Each element is_prime[i]
corresponds to an odd number 2*i + 1
.
is_prime[i*p + i::p] = [False]*len(is_prime[i*p + i::p])
When I find a prime p
, I can mark all elements corresponding to multiples of that prime False
, and since all multiples smaller than p**2
are also multiples of smaller primes, I can skip marking those. The index of p**2
is i*p + i
.
I'm worried about the cost of computing [False]*len(is_prime[i*p + 1::p])
and I have tried to compare it to two other strategies that I couldn't get to work.
For some reason, the formula (len(is_prime) - (i*p + i))/p
(if positive) is not always equal to len(is_prime[i*p + i::p])
. Is it because I've calculated the length of the slice wrong, or is there something subtle about slicing that I haven't caught?
When I use the following lines in my function:
print len(is_prime[i*p + i::p]), ((len(is_prime) - (i*p + i))/p)
is_prime[i*p + i::p] = [False]*((len(is_prime) - (i*p + i))/p)
I get the following output (case n = 50
):
>>> eratosthenes2(50)
7 7
3 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 9, in eratosthenes2
ValueError: attempt to assign sequence of size 2 to extended slice of size 3
I also tried replacing the bulk updating line with the following:
for j in xrange(i*p + i, len(is_prime), p):
is_prime[j] = False
But this fails for large values of n
because xrange
doesn't take anything bigger than a long. I gave up on trying to wrestle itertools.count
into what I needed.
Are there faster and more elegant ways to bulk-update the list slice? Is there anything I can do to fix the other strategies that I tried, so that I can compare them to the working one? Thanks!
Upvotes: 4
Views: 1479
Reputation: 30161
Use itertools.repeat()
:
is_prime[i*p + 1::p] = itertools.repeat(False, len(is_prime[i*p + 1::p]))
The slicing syntax will iterate over whatever you put on the right-hand side; it doesn't need to be a full-blown sequence.
So let's fix that formula. I'll just borrow the Python 3 formula since we know that works:
1 + (hi - 1 - lo) / step
Since step > 0
, hi = stop
and lo = start
, so we have:
1 + (len(is_prime) - 1 - (i*p + 1))//p
(//
is integer division; this future-proofs our code for Python 3, but requires 2.7 to run).
Now, put it all together:
slice_len = 1 + (len(is_prime) - 1 - (i*p + 1))//p
is_prime[i*p + 1::p] = itertools.repeat(False, slice_len)
Python 3 users: Please do not use this formula directly. Instead, just write len(range(start, stop, step))
. That gives the same result with similar performance (i.e. it's O(1)) and is much easier to read.
Upvotes: 2