EPI: Optimized Algorithm for Generating Primes Numbers

Question

I am going through the "Elements of Programming Interview" in python currently and have gotten stuck at this part. The code below generates primes up to n. The explanation is rather lacking and i do not have the mathematical background to make sense of it.

We can improve run-time by sieving p's multiples from p^2 instead of p, since all numbers of the form kp, where k < p have already been sieved out.

The code is below:

def generate_primesII(n):

    if n < 2:
        return []

    size = (n - 3) // 2 + 1
    primes = [2]  # stores the primes from 1 to n

    # is_prime[i] represents (2i + 2) is prime or not
    # Initially set each to true. Then use sieving to eliminate nonprimes
    is_prime = [True] * size

    for i in range(size):
        if is_prime[i]:

            p = i * 2 + 3
            primes.append(p)

            # Sieving from p^2, where p^2 = (4i^2 + 12i + 9). The index in is_prime
            # is (2i^2 + 6i + 3) because is_prime[i] represents 2i + 3

            # note that we need to use long for j because p^2 might overflow
            for j in range(2 * i**2 + 6 * i + 3, size, p):
                is_prime[j] = False
    return primes

My questions are:

how did they come up with the formula for the size
they say is_prime[i] represents (2i + 3) is prime or not. I cant make sense of why 2i + 3.
how did they get p = i * 2 + 3
what does the following mean Sieving from p^2, where p^2 = (4i^2 + 12i + 9). The index in is_prime is (2i^2 + 6i + 3) because is_prime[i] represents 2i + 3
why does the range of j begin with 2 * i**2 + 6 * i + 3

Most of the numbers seem rather random to me

ilim · Accepted Answer

There are two key tricks which are simultaneously done here. That, I believe, is the main reason behind your confusion. The first is a mathematical fact about the progression about the sieve algorithm. (i.e., starting to update from p²) The other is a trick employed to use less space by not storing any is_prime data for even numbers)

Let's start with your first two questions. The (2 * i + 3) mapping used in is_prime[i] seems to be a spatial optimization to reduce the space used to half. (i.e., no even number is represented in is_prime list) The mapping helps iterate only the list of odd numbers starting from 3, up to size. If you actually replace the i variable in (2i + 3) with the initial value of size, you will see that you end up with n. (or n-1, depending on whether n is even or odd)

Your third question is relatively more straightforward. In the outer loop, i iterates over the space of odd integers up to n. As there is a mapping of (2i + 3) in is_prime, p is assigned that value. From that point on, p represents the actual prime value which is to be used in the inner loop.

The comment in your fourth question simply further explains the mathematical idea of starting to iterate from p². As the loop constitutes i to be part of a mapping (to actual values) the p² is further expressed in terms of that variable i. I think that comment attempts to express the use of 2 * i**2 + 6 * i + 3 to initialize the range of j, but is quite unclear.

To answer your final question, we should consider what j actually represents. j represents the space of odd numbers to be updated. Similar with the loop for i, j iterates not over the actual values, but on the odd numbers. The initial value of j is 2 * i**2 + 6 * i + 3, because when you replace that value with the i variable in (2*i + 3) (i.e., the mapping from the odd numbers' space to the set of actual values), you obtain 4 * i**2 + 12 * i + 9, which is p².

The inner loop is basically assigning is_prime[j]=False to all the cells that represent the multiples of the actual prime value p, starting from the one representing the value p².

EPI: Optimized Algorithm for Generating Primes Numbers

Answers (2)

Related Questions