How can I optimize sieve of eratosthenes to just store prime numbers for a very large range?

I have studied the working of Sieve of Eratosthenes for generating the prime numbers up to a given number using iteration and striking off all the composite numbers. And the algorithm needs just to be iterated up to sqrt(n) where n is the upper bound upto which we need to find all the prime numbers. We know that the number of prime numbers up to n=10^9 is very less as compared to the number of composite numbers. So we use all the space to just tell that these numbers are not prime first by marking them composite. My question is can we modify the algorithm to just store prime numbers since we deal with a very large range (since number of primes are very less)? Can we just store straight away the prime numbers?

Upvotes: 2

Answers (5)

rossum

Reputation: 15693

You can halve the size of the sieve by only 'storing' the odd numbers. This requires code to explicitly deal with the case of testing even numbers. For odd numbers, bit b of the sieve represents n = 2b + 3. Hence bit 0 represents 3, bit 1 represents 5 and so on. There is a small overhead in converting between the number n and the bit index b.

Whether this technique is any use to you depends on the memory/speed balance you require.

Upvotes: 0

DarthGizka

Reputation: 4675

Changing the structure from that of a set (sieve) - one bit per candidate - to storing primes (e.g. in a list, vector or tree structure) actually increases storage requirements.

Example: there are 203.280.221 primes below 2^32. An array of uint32_t of that size requires about 775 MiB whereas the corresponding bitmap (a.k.a. set representation) occupies only 512 MiB (2^32 bits / 8 bits/byte = 2^29 bytes).

The most compact number-based representation with fixed cell size would be storing the halved distance between consecutive odd primes, since up to about 2^40 the halved distance fits into a byte. At 193 MiB for the primes up to 2^32 this is slightly smaller than an odds-only bitmap but it is only efficient for sequential processing. For sieving it is not suitable because, as Anatolijs has pointed out, algorithms like the Sieve of Eratosthenes effectively require a set representation.

The bitmap can be shrunk drastically by leaving out the multiples of small primes. Most famous is the odds-only representation that leaves out the number 2 and its multiples; this halves the space requirement to 256 MiB at virtually no cost in added code complexity. You just need to remember to pull the number 2 out of thin air when needed, since it isn't represented in the sieve.

Even more space can be saved by leaving out multiples of more small primes; this generalisation of the 'odds-only' trick is usually called 'wheeled storage' (see Wheel Factorization in the Wikipedia). However, the gain from adding more small primes to the wheel gets smaller and smaller whereas the wheel modulus ('circumference') increases explosively. Adding 3 removes 1/3rd of the remaining numbers, adding 5 removes a further 1/5th, adding 7 only gets you a further 1/7th and so on.

Here's an overview of what adding another prime to the wheel can get you. 'ratio' is the size of the wheeled/reduced set relative to the full set that represents every number; 'delta' gives the shrinkage compared to the previous step. 'spokes' refers to the number of prime-bearing spokes which need to be represented/stored; the total number of spokes for a wheel is of course equal to its modulus (circumference).

The mod 30 wheel (about 136 MiB for the primes up to 2^32) offers an excellent cost/benefit ratio because it has eight prime-bearing spokes, which means that there is a one-to-one correspondence between wheels and 8-bit bytes. This enables many efficient implementation tricks. However, its cost in added code complexity is considerable despite this fortuitous circumstance, and for many purposes the odds-only sieve ('mod 2 wheel') gives the most bang for buck by far.

There are two additional considerations worth keeping in mind. The first is that data sizes like these often exceed the capacity of memory caches by a wide margin, so that programs can often spend a lot of time waiting for the memory system to deliver the data. This is compounded by the typical access patterns of sieving - striding over the whole range, again and again and again. Speedups of several orders of magnitude are possible by working the data in small batches that fit into the level-1 data cache of the processor (typically 32 KiB); lesser speedups are still possible by keeping within the capacity of the L2 and L3 caches (a few hundred KiB and a few MiB, respectively). The keyword here is 'segmented sieving'.

The second consideration is that many sieving tasks - like the famous SPOJ PRIME1 and its updated version PRINT (with extended bounds and tightened time limit) - require only the small factor primes up to the square root of the upper limit to be permanently available for direct access. That's a comparatively small number: 3512 when sieving up to 2^31 as in the case of PRINT.

Since these primes have already been sieved there's no need for a set representation anymore, and since they are few there are no problems with storage space. This means they are most profitably kept as actual numbers in a vector or list for easy iteration, perhaps with additional, auxiliary data like current working offset and phase. The actual sieving task is then easily accomplished via a technique called 'windowed sieving'. In the case of PRIME1 and PRINT this can be several orders of magnitude faster than sieving the whole range up to the upper limit, since both tasks only asks for a small number of subranges to be sieved.

Upvotes: 8

vish4071

Reputation: 5277

If you have studied the sieve right, you must know we don't have the primes to begin with. We have an array, sizeof which is equal to the range. Now, if you want the range to be 10e9, you want this to be the size of the array. You have mentioned no language, but for each number, you must need a bit to represent whether it is prime or not.

Even that means you need 10^9 bits = 1.125 * 10^8 bytes which is greater than 100 MB of RAM.

Assuming you have all this, most optimized sieve takes O(n * log(log n)) time, which is, if n = 10e9, on a machine that evaluates 10e8 instructions per second, will still take some minutes.

Now, assuming you have all this with you, still, number of primes till 10e9 is q = 50,847,534, to save these will still take q * 4 bytes, which is still greater than 100MB. (more RAM)

Even if you remove the indexes which are multiples of 2, 3 or 5, this removes 21 numbers in every 30. This is not good enough, because in total, you will still need around 140 MB space. (40MB = a third of 10^9 bits + ~100MB for storing prime numbers).

So, since, for storing the primes, you will, in any case require similar amount of memory (of the same order as calculation), your question, IMO has no solution.

Upvotes: 0

Anatolijs Gorbunovs

Reputation: 69

You can do that (remove numbers that are detected to be non-prime from your array/linked list), but then time complexity of algorithm will degrade to O(N^2/log(N)) or something like that, instead of original O(N*log(N)). This is because you will not be able to say "the numbers 2X, 3X, 4X, ..." are not prime anymore. You will have to loop through your entire compressed list.

Upvotes: 1

joshpj1

Reputation: 186

You could erase each composite number from the array/vector once you have shown it to be composite. Or when you fill an array of numbers to put through the sieve, remove all even numbers (other than 2) and all numbers ending in 5.

Upvotes: 0

How can I optimize sieve of eratosthenes to just store prime numbers for a very large range?

Answers (5)

Related Questions