John Smith
John Smith

Reputation: 12246

Segmented Sieve of Eratosthenes?

It's easy enough to make a simple sieve:

for (int i=2; i<=N; i++){
    if (sieve[i]==0){
        cout << i << " is prime" << endl;
        for (int j = i; j<=N; j+=i){
            sieve[j]=1;
        }
    }
    cout << i << " has " << sieve[i] << " distinct prime factors\n";
}

But what about when N is very large and I can't hold that kind of array in memory? I've looked up segmented sieve approaches and they seem to involve finding primes up until sqrt(N) but I don't understand how it works. What if N is very large (say 10^18)?

My goal is to acquire the number of unique prime factors per value k for large N, for each k up to N.

Upvotes: 43

Views: 35165

Answers (7)

Andy Richter
Andy Richter

Reputation: 189

Sieve in Python

Here is one in Python3. I used bitarray for the sieves because they are smaller. A 7 billion size sieve takes about a GB of memory. It takes about a second for primes in the hundreds of trillions. I ran a sieve to 534,543,645,364,535,567 and it took about a minute. 10^19+9 for a million wide segment took 4 minutes and 6.5 GB memory.

from bitarray import bitarray
import math
 
def sieve(high):
   limit = math.ceil(math.sqrt(high)+1)
   p = bitarray(limit+6) # extra for last 6k+1
   p.setall(True)
   p[0:2] = False   # Clear zero and one
   p[4::2] = False  # Clear multiples of 2
   p[9::3] = False  # Clear multiples of 3
   # Process only numbers of the form 6k-1 and 6k+1
   for i,h in ((k6-1,k6+1) for k6 in range(6,limit,6)):  
      if p[i]: p[i*i::2 * i] = False  # Clear multiples of 6k-1
      if p[h]: p[h*h::2 * h] = False  # Clear multiples of 6k+1 
   print(f"Created sieve of {p.count():,} primes, ",end='')
   print(f"max is {p.find(bitarray('1'),right=True)}") 
   return p
    
# in segmented sieve we find for primes from range [low, high]
def segmentedSieve(low, high):
   low_primes = sieve(high)
   segment = bitarray(high-low + 1)
   segment.setall(True)
   # here segment[n] indicates whether low+n is prime or not 
   # we don't count from zero. so start must be found for each prime
   for prm in low_primes.search(bitarray('1')): # iterate low primes
      lower = (low//prm) # search for first multiple of prm above low
      if lower <= 1:
         lower = prm+prm # for 2,3 we need [4::2] = False
      else:
         lower = (lower*prm + prm) if (low % prm) else (lower*prm) 
      lower-=low # re-scale to our segment start
      segment[lower::prm] = False # clear multiple of this prime
   return [k+low for k in segment.search(bitarray('1'))]
 
if __name__ == '__main__':
   print("Finds a range of primes") 
   low = int(input("Enter start of range> "))
   skip = int(input("How many numbers in range> "))
   high = low+skip
   print(f"Primes in Range {low:,} to {high:,} are" )
   segment = [f"{x:,}" for x in segmentedSieve(low, high)]
   print(*segment)

Output in the quadrillions:

$ python seg_sieve.py
Finds a range of primes
Enter start of range> 1234567890123456
How many numbers in range> 323
Primes in Range 1,234,567,890,123,456 to 1,234,567,890,123,779 are
Created sieve of 2,154,563 primes, max is 35136397
1,234,567,890,123,481 1,234,567,890,123,493 1,234,567,890,123,527 1,234,567,890,123,583 1,234,567,890,123,647 1,234,567,890,123,661 1,234,567,890,123,671 1,234,567,890,123,697 1,234,567,890,123,703 1,234,567,890,123,749 1,234,567,890,123,773

Here is 10^18+3 for a segment 2000 long:

$ time python seg_sieve.py
Finds a range of primes
Enter start of range> 1000000000000000003
How many numbers in range> 2000
Primes in Range 1,000,000,000,000,000,003 to 1,000,000,000,000,002,003 are
Created sieve of 50,847,535 primes, max is 1000000007
1,000,000,000,000,000,003 1,000,000,000,000,000,009 1,000,000,000,000,000,031 1,000,000,000,000,000,079 1,000,000,000,000,000,177 1,000,000,000,000,000,183 1,000,000,000,000,000,201 1,000,000,000,000,000,283 1,000,000,000,000,000,381 1,000,000,000,000,000,387 1,000,000,000,000,000,507 1,000,000,000,000,000,523 1,000,000,000,000,000,583 1,000,000,000,000,000,603 1,000,000,000,000,000,619 1,000,000,000,000,000,621 1,000,000,000,000,000,799 1,000,000,000,000,000,841 1,000,000,000,000,000,861 1,000,000,000,000,000,877 1,000,000,000,000,000,913 1,000,000,000,000,000,931 1,000,000,000,000,000,997 1,000,000,000,000,001,093 1,000,000,000,000,001,191 1,000,000,000,000,001,267 1,000,000,000,000,001,323 1,000,000,000,000,001,347 1,000,000,000,000,001,359 1,000,000,000,000,001,453 1,000,000,000,000,001,459 1,000,000,000,000,001,537 1,000,000,000,000,001,563 1,000,000,000,000,001,593 1,000,000,000,000,001,659 1,000,000,000,000,001,683 1,000,000,000,000,001,729 1,000,000,000,000,001,743 1,000,000,000,000,001,771 1,000,000,000,000,001,827 1,000,000,000,000,001,879 1,000,000,000,000,001,953

real    1m11.307s
user    1m3.622s
sys     0m0.640s

Upvotes: 0

Bruno Simas Hadlich
Bruno Simas Hadlich

Reputation: 69

Based on Swapnil Kumar answer I did the following algorithm in C. It was built with mingw32-make.exe.

#include <math.h>
#include <stdio.h>
#include <stdlib.h>

int main()
{
    const int MAX_PRIME_NUMBERS = 5000000;//The number of prime numbers we are looking for
    long long *prime_numbers = malloc(sizeof(long long) * MAX_PRIME_NUMBERS);
    prime_numbers[0] = 2;
    prime_numbers[1] = 3;
    prime_numbers[2] = 5;
    prime_numbers[3] = 7;
    prime_numbers[4] = 11;
    prime_numbers[5] = 13;
    prime_numbers[6] = 17;
    prime_numbers[7] = 19;
    prime_numbers[8] = 23;
    prime_numbers[9] = 29;
    const int BUFFER_POSSIBLE_PRIMES = 29 * 29;//Because the greatest prime number we have is 29 in the 10th position so I started with a block of 841 numbers
    int qt_calculated_primes = 10;//10 because we initialized the array with the ten first primes
    int possible_primes[BUFFER_POSSIBLE_PRIMES];//Will store the booleans to check valid primes
    long long iteration = 0;//Used as multiplier to the range of the buffer possible_primes
    int i;//Simple counter for loops
    while(qt_calculated_primes < MAX_PRIME_NUMBERS)
    {
        for (i = 0; i < BUFFER_POSSIBLE_PRIMES; i++)
            possible_primes[i] = 1;//set the number as prime

        int biggest_possible_prime = sqrt((iteration + 1) * BUFFER_POSSIBLE_PRIMES);

        int k = 0;

        long long prime = prime_numbers[k];//First prime to be used in the check

        while (prime <= biggest_possible_prime)//We don't need to check primes bigger than the square root
        {
            for (i = 0; i < BUFFER_POSSIBLE_PRIMES; i++)
                if ((iteration * BUFFER_POSSIBLE_PRIMES + i) % prime == 0)
                    possible_primes[i] = 0;

            if (++k == qt_calculated_primes)
                break;

            prime = prime_numbers[k];
        }
        for (i = 0; i < BUFFER_POSSIBLE_PRIMES; i++)
            if (possible_primes[i])
            {
                if ((qt_calculated_primes < MAX_PRIME_NUMBERS) && ((iteration * BUFFER_POSSIBLE_PRIMES + i) != 1))
                {
                    prime_numbers[qt_calculated_primes] = iteration * BUFFER_POSSIBLE_PRIMES + i;
                    printf("%d\n", prime_numbers[qt_calculated_primes]);
                    qt_calculated_primes++;
                } else if (!(qt_calculated_primes < MAX_PRIME_NUMBERS))
                    break;
            }

        iteration++;
    }

    return 0;
}

It set a maximum of prime numbers to be found, then an array is initialized with known prime numbers like 2, 3, 5...29. So we make a buffer that will store the segments of possible primes, this buffer can't be greater than the power of the greatest initial prime that in this case is 29.

I'm sure there are a plenty of optimizations that can be done to improve the performance like parallelize the segments analysis process and skip numbers that are multiple of 2, 3 and 5 but it serves as an example of low memory consumption.

Upvotes: 1

Amitesh mani tiwari
Amitesh mani tiwari

Reputation: 549

A number is prime if none of the smaller prime numbers divides it. Since we iterate over the prime numbers in order, we already marked all numbers, who are divisible by at least one of the prime numbers, as divisible. Hence if we reach a cell and it is not marked, then it isn't divisible by any smaller prime number and therefore has to be prime.

Remember these points:-

// Generating all prime number up to  R
 
// creating an array of size (R-L-1) set all elements to be true: prime && false: composite
     

#include<bits/stdc++.h>

using namespace std;

#define MAX 100001

vector<int>* sieve(){
    bool isPrime[MAX];

    for(int i=0;i<MAX;i++){
        isPrime[i]=true;
    }
 for(int i=2;i*i<MAX;i++){
     if(isPrime[i]){
         for(int j=i*i;j<MAX;j+=i){
             isPrime[j]=false;
         }
     }
 }

 vector<int>* primes = new vector<int>();
 primes->push_back(2);
 for(int i=3;i<MAX;i+=2){
     if(isPrime[i]){
     primes->push_back(i);
     }
}

 return primes;
}

void printPrimes(long long l, long long r, vector<int>*&primes){
      bool isprimes[r-l+1];
      for(int i=0;i<=r-l;i++){
          isprimes[i]=true;
      }

      for(int i=0;primes->at(i)*(long long)primes->at(i)<=r;i++){

          int currPrimes=primes->at(i);
          //just smaller or equal value to l
          long long base =(l/(currPrimes))*(currPrimes);
      

          if(base<l){
              base=base+currPrimes;
          }
    
    //mark all multiplies within L to R as false

          for(long long j=base;j<=r;j+=currPrimes){
              isprimes[j-l]=false;
          }

   //there may be a case where base is itself a prime number

          if(base==currPrimes){
              isprimes[base-l]= true;
          }
    }

          for(int i=0;i<=r-l;i++){
              if(isprimes[i]==true){
                  cout<<i+l<<endl;
              }
          

      }
}
int main(){
    vector<int>* primes=sieve();
   
   int t;
   cin>>t;
   while(t--){
       long long l,r;
       cin>>l>>r;
       printPrimes(l,r,primes);
   }

    return 0;

}

Upvotes: 0

Tomasz Andel
Tomasz Andel

Reputation: 173

If someone would like to see C++ implementation, here is mine:

void sito_delta( int delta, std::vector<int> &res)
{

std::unique_ptr<int[]> results(new int[delta+1]);
for(int i = 0; i <= delta; ++i)
    results[i] = 1;

int pierw = sqrt(delta);
for (int j = 2; j <= pierw; ++j)
{
    if(results[j])
    {
        for (int k = 2*j; k <= delta; k+=j)
        {
            results[k]=0;
        }
    }
}

for (int m = 2; m <= delta; ++m)
    if (results[m])
    {
        res.push_back(m);
        std::cout<<","<<m;
    }
};
void sito_segment(int n,std::vector<int> &fiPri)
{
int delta = sqrt(n);

if (delta>10)
{
    sito_segment(delta,fiPri);
   // COmpute using fiPri as primes
   // n=n,prime = fiPri;
      std::vector<int> prime=fiPri;
      int offset = delta;
      int low = offset;
      int high = offset * 2;
      while (low < n)
      {
          if (high >=n ) high = n;
          int mark[offset+1];
          for (int s=0;s<=offset;++s)
              mark[s]=1;

          for(int j = 0; j< prime.size(); ++j)
          {
            int lowMinimum = (low/prime[j]) * prime[j];
            if(lowMinimum < low)
                lowMinimum += prime[j];

            for(int k = lowMinimum; k<=high;k+=prime[j])
                mark[k-low]=0;
          }

          for(int i = low; i <= high; i++)
              if(mark[i-low])
              {
                fiPri.push_back(i);
                std::cout<<","<<i;
              }
          low=low+offset;
          high=high+offset;
      }
}
else
{

std::vector<int> prime;
sito_delta(delta, prime);
//
fiPri = prime;
//
int offset = delta;
int low = offset;
int high = offset * 2;
// Process segments one by one 
while (low < n)
{
    if (high >= n) high = n;
    int  mark[offset+1];
    for (int s = 0; s <= offset; ++s)
        mark[s] = 1;

    for (int j = 0; j < prime.size(); ++j)
    {
        // find the minimum number in [low..high] that is
        // multiple of prime[i] (divisible by prime[j])
        int lowMinimum = (low/prime[j]) * prime[j];
        if(lowMinimum < low)
            lowMinimum += prime[j];

        //Mark multiples of prime[i] in [low..high]
        for (int k = lowMinimum; k <= high; k+=prime[j])
            mark[k-low] = 0;
    }

    for (int i = low; i <= high; i++)
        if(mark[i-low])
        {
            fiPri.push_back(i);
            std::cout<<","<<i;
        }
    low = low + offset;
    high = high + offset;
}
}
};

int main()
{
std::vector<int> fiPri;
sito_segment(1013,fiPri);
}

Upvotes: 2

user448810
user448810

Reputation: 17866

The basic idea of a segmented sieve is to choose the sieving primes less than the square root of n, choose a reasonably large segment size that nevertheless fits in memory, and then sieve each of the segments in turn, starting with the smallest. At the first segment, the smallest multiple of each sieving prime that is within the segment is calculated, then multiples of the sieving prime are marked as composite in the normal way; when all the sieving primes have been used, the remaining unmarked numbers in the segment are prime. Then, for the next segment, for each sieving prime you already know the first multiple in the current segment (it was the multiple that ended the sieving for that prime in the prior segment), so you sieve on each sieving prime, and so on until you are finished.

The size of n doesn't matter, except that a larger n will take longer to sieve than a smaller n; the size that matters is the size of the segment, which should be as large as convenient (say, the size of the primary memory cache on the machine).

You can see a simple implementation of a segmented sieve here. Note that a segmented sieve will be very much faster than O'Neill's priority-queue sieve mentioned in another answer; if you're interested, there's an implementation here.

EDIT: I wrote this for a different purpose, but I'll show it here because it might be useful:

Though the Sieve of Eratosthenes is very fast, it requires O(n) space. That can be reduced to O(sqrt(n)) for the sieving primes plus O(1) for the bitarray by performing the sieving in successive segments. At the first segment, the smallest multiple of each sieving prime that is within the segment is calculated, then multiples of the sieving prime are marked composite in the normal way; when all the sieving primes have been used, the remaining unmarked numbers in the segment are prime. Then, for the next segment, the smallest multiple of each sieving prime is the multiple that ended the sieving in the prior segment, and so the sieving continues until finished.

Consider the example of sieve from 100 to 200 in segments of 20. The five sieving primes are 3, 5, 7, 11 and 13. In the first segment from 100 to 120, the bitarray has ten slots, with slot 0 corresponding to 101, slot k corresponding to 100+2k+1, and slot 9 corresponding to 119. The smallest multiple of 3 in the segment is 105, corresponding to slot 2; slots 2+3=5 and 5+3=8 are also multiples of 3. The smallest multiple of 5 is 105 at slot 2, and slot 2+5=7 is also a multiple of 5. The smallest multiple of 7 is 105 at slot 2, and slot 2+7=9 is also a multiple of 7. And so on.

Function primesRange takes arguments lo, hi and delta; lo and hi must be even, with lo < hi, and lo must be greater than sqrt(hi). The segment size is twice delta. Ps is a linked list containing the sieving primes less than sqrt(hi), with 2 removed since even numbers are ignored. Qs is a linked list containing the offest into the sieve bitarray of the smallest multiple in the current segment of the corresponding sieving prime. After each segment, lo advances by twice delta, so the number corresponding to an index i of the sieve bitarray is lo + 2i + 1.

function primesRange(lo, hi, delta)
    function qInit(p)
        return (-1/2 * (lo + p + 1)) % p
    function qReset(p, q)
        return (q - delta) % p
    sieve := makeArray(0..delta-1)
    ps := tail(primes(sqrt(hi)))
    qs := map(qInit, ps)
    while lo < hi
        for i from 0 to delta-1
            sieve[i] := True
        for p,q in ps,qs
            for i from q to delta step p
                sieve[i] := False
        qs := map(qReset, ps, qs)
        for i,t from 0,lo+1 to delta-1,hi step 1,2
            if sieve[i]
                output t
        lo := lo + 2 * delta

When called as primesRange(100, 200, 10), the sieving primes ps are [3, 5, 7, 11, 13]; qs is initially [2, 2, 2, 10, 8] corresponding to smallest multiples 105, 105, 105, 121 and 117, and is reset for the second segment to [1, 2, 6, 0, 11] corresponding to smallest multiples 123, 125, 133, 121 and 143.

You can see this program in action at http://ideone.com/iHYr1f. And in addition to the links shown above, if you are interested in programming with prime numbers I modestly recommend this essay at my blog.

Upvotes: 60

OneWhoKnocks
OneWhoKnocks

Reputation: 91

It's just that we are making segmented with the sieve we have. The basic idea is let's say we have to find out prime numbers between 85 and 100. We have to apply the traditional sieve,but in the fashion as described below:

So we take the first prime number 2 , divide the starting number by 2(85/2) and taking round off to smaller number we get p=42,now multiply again by 2 we get p=84, from here onwards start adding 2 till the last number.So what we have done is that we have removed all the factors of 2(86,88,90,92,94,96,98,100) in the range.

We take the next prime number 3 , divide the starting number by 3(85/3) and taking round off to smaller number we get p=28,now multiply again by 3 we get p=84, from here onwards start adding 3 till the last number.So what we have done is that we have removed all the factors of 3(87,90,93,96,99) in the range.

Take the next prime number=5 and so on.................. Keep on doing the above steps.You can get the prime numbers (2,3,5,7,...) by using the traditional sieve upto sqrt(n).And then use it for segmented sieve.

Upvotes: 7

Fred Foo
Fred Foo

Reputation: 363627

There's a version of the Sieve based on priority queues that yields as many primes as you request, rather than all of them up to an upper bound. It's discussed in the classic paper "The Genuine Sieve of Eratosthenes" and googling for "sieve of eratosthenes priority queue" turns up quite a few implementations in various programming languages.

Upvotes: 5

Related Questions