castle-bravo
castle-bravo

Reputation: 1429

Bulk updating a slice of a Python list

I have written a simple implementation of the Sieve of Eratosthenes, and I would like to know if there is a more efficient way to perform one of the steps.

def eratosthenes(n):
    primes = [2]
    is_prime = [False] + ((n - 1)/2)*[True]
    for i in xrange(len(is_prime)):
        if is_prime[i]:
            p = 2*i + 1
            primes.append(p)
            is_prime[i*p + i::p] = [False]*len(is_prime[i*p + i::p])
    return primes

I am using Python's list slicing to update my list of booleans is_prime. Each element is_prime[i] corresponds to an odd number 2*i + 1.

is_prime[i*p + i::p] = [False]*len(is_prime[i*p + i::p])

When I find a prime p, I can mark all elements corresponding to multiples of that prime False, and since all multiples smaller than p**2 are also multiples of smaller primes, I can skip marking those. The index of p**2 is i*p + i.

I'm worried about the cost of computing [False]*len(is_prime[i*p + 1::p]) and I have tried to compare it to two other strategies that I couldn't get to work.

For some reason, the formula (len(is_prime) - (i*p + i))/p (if positive) is not always equal to len(is_prime[i*p + i::p]). Is it because I've calculated the length of the slice wrong, or is there something subtle about slicing that I haven't caught?

When I use the following lines in my function:

print len(is_prime[i*p + i::p]), ((len(is_prime) - (i*p + i))/p)
is_prime[i*p + i::p] = [False]*((len(is_prime) - (i*p + i))/p)

I get the following output (case n = 50):

>>> eratosthenes2(50)
7 7
3 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 9, in eratosthenes2
ValueError: attempt to assign sequence of size 2 to extended slice of size 3

I also tried replacing the bulk updating line with the following:

for j in xrange(i*p + i, len(is_prime), p):
    is_prime[j] = False

But this fails for large values of n because xrange doesn't take anything bigger than a long. I gave up on trying to wrestle itertools.count into what I needed.

Are there faster and more elegant ways to bulk-update the list slice? Is there anything I can do to fix the other strategies that I tried, so that I can compare them to the working one? Thanks!

Upvotes: 4

Views: 1479

Answers (1)

Kevin
Kevin

Reputation: 30161

Use itertools.repeat():

is_prime[i*p + 1::p] = itertools.repeat(False, len(is_prime[i*p + 1::p]))

The slicing syntax will iterate over whatever you put on the right-hand side; it doesn't need to be a full-blown sequence.

So let's fix that formula. I'll just borrow the Python 3 formula since we know that works:

1 + (hi - 1 - lo) / step

Since step > 0, hi = stop and lo = start, so we have:

1 + (len(is_prime) - 1 - (i*p + 1))//p

(// is integer division; this future-proofs our code for Python 3, but requires 2.7 to run).

Now, put it all together:

slice_len = 1 + (len(is_prime) - 1 - (i*p + 1))//p
is_prime[i*p + 1::p] = itertools.repeat(False, slice_len)

Python 3 users: Please do not use this formula directly. Instead, just write len(range(start, stop, step)). That gives the same result with similar performance (i.e. it's O(1)) and is much easier to read.

Upvotes: 2

Related Questions