Nikhil
Nikhil

Reputation: 71

c++ sieve of Eratosthenes my code is slow

I'm trying to find the number of prime numbers below 400 million but even with just 40 million my code is taking 8 secs to run. what am i doing wrong?

what can i do to make it faster?

#include<iostream>
#include<math.h>
#include<vector>
using namespace std;
int main()
{
    vector<bool> k;                         
    vector<long long int> c;                
    for (int i=2;i<40000000;i++)
    {
        k.push_back(true);                  
        c.push_back(i);
    }

    for ( int i=0;i<sqrt(40000000)+1;i++)                            
    {                                                               
        if (k[i]==true)                                              
       {                                                            
           for (int j=i+c[i];j<40000000;j=j+c[i])                  
           {                                                       
               k[j]=false; 
           }
       }
    }
    vector <long long int> arr;
    for ( int i=0;i<40000000-2;i++)
    {
        if (k[i]==true)
        {
            arr.push_back(c[i]);
        }
    }
    cout << arr.size() << endl ;
    return 0;
}

Upvotes: 4

Views: 516

Answers (3)

LookAheadAtYourTypes
LookAheadAtYourTypes

Reputation: 1699

  1. Remove vector c, you don't need it.
  2. Create vector k with known size at start. Repeatedly appending elements to a vector by invoking push_back() is a really bad idea from a performance point of view, as it can cause repeated memory reallocations and copies.
  3. http://primesieve.org/segmented_sieve.html - segmented version for inspiration.
  4. You can skip processing multiples of 2 and 3. link from code review
  5. It looks that you've got some issue in compiler optimization flag settings. Maybe you didn't change configuration from debug to release. What is your release speedup vs debug one?

Upvotes: 4

johnbakers
johnbakers

Reputation: 24771

I profiled your code as well as a simple tweak, below. The tweak is more than twice as fast:

    auto start = std::chrono::high_resolution_clock::now();

    //original version
    vector<bool> k;                         
    vector<long long int> c;                
    for (int i=2;i<40000000;i++)
      {
        k.push_back(true);                  
        c.push_back(i);
      }

    for ( int i=0;i<sqrt(40000000)+1;i++)                            
      {                                                               
        if (k[i]==true)                                              
          {                                                            
            for (int j=i+c[i];j<40000000;j=j+c[i])                  
              {                                                       
                k[j]=false; 
              }
          }
      }
    vector <long long int> arr;
    for ( int i=0;i<40000000-2;i++)
      {
        if (k[i]==true)
          {
            arr.push_back(c[i]);
          }
      }
    cout << arr.size() << endl ;


    auto end1 = std::chrono::high_resolution_clock::now();
    std::cout << "Elapsed = " << 
      std::chrono::duration_cast<std::chrono::milliseconds>(end1 - start).count() <<
      std::endl;

  }

  {

    auto begin = std::chrono::high_resolution_clock::now();

    //new version

    const long limit{40000000};
    vector<bool> k(limit-1,true);  

    //k[0] is the number 0
    k[0]=false; k[1]=false;

    auto sq = sqrt(limit) + 1;

    //start at the number 2
    for ( int i=2;i<sq;i++)                            
      {                                                               
        if (k[i]==true)                                              
          {                                                            
            for (int j=i+i;j<limit;j+=i)                  
              {                                                       
                k[j]=false; 
              }
          }
      }


    vector <long long int> arr;
    for ( int i=0;i<limit-2;i++)
      {
        if (k[i]==true)
          {
            arr.push_back(i);
          }
      }
    cout << arr.size() << endl ;



    auto stop = std::chrono::high_resolution_clock::now();
    std::cout << "Elapsed = " << 
      std::chrono::duration_cast<std::chrono::milliseconds>(stop - begin).count() <<
      std::endl;

  }

Here is the output (elapsed in milliseconds), in Debug mode:

2433654
Elapsed = 5787
2433654
Elapsed = 2432

Both have same results, second is much faster.

Here is another version using some nice C++ features (requiring less code), and it is about 11% faster than the second version above:

    auto begin = std::chrono::high_resolution_clock::now();

    const long limit{40000000};
    vector<int> k(limit-1,0);

    //fill with sequence of integers
    std::iota(k.begin(),k.end(),0);

    //k[0] is the number 0
    //integers reset to 0 are not prime
    k[0]=0; k[1]=0;

    auto sq = sqrt(limit) + 1;

    //start at the number 2
    for (int i=2;i<sq;i++)                            
      {                                                               
        if (k[i])
          {                                                            
            for (int j=i+i;j<limit;j+=i)                  
              {                                                       
                k[j]=0; 
              }
          }
      }

    auto results = std::remove(k.begin(),k.end(),0);

    cout << results - k.begin() << endl ;


    auto stop = std::chrono::high_resolution_clock::now();
    std::cout << "Elapsed = " << 
      std::chrono::duration_cast<std::chrono::milliseconds>(stop - begin).count() <<
      std::endl;

  }

Note that in your original version, you push_back in three different places, while this use of modern idioms never uses push_back at all when operating on the vectors.

In this example, the vector is of ints so that you have the actual list of prime numbers when you are finished.

Output:

2433654
Elapsed = 2160

These above are all Debug mode numbers.

In Release mode, the best is a combination of the second and third techniques above, using the numeric with a vector of bools, if you don't care what the actual prime numbers are in the end:

2433654
Elapsed = 1098
2433654
Elapsed bool remove= 410
2433654
Elapsed = 779

Note that your original code only takes about 1 second on my 5 year-old laptop in Release mode, so you are probably running in Debug mode.

Upvotes: 5

Aaron
Aaron

Reputation: 2364

I got it down from taking 10 seconds to run to just half a second on my computer by changing two things. First, I'm guessing you didn't compile it with optimization enabled. That brought it from 10 seconds down to 1 second for me. Second, the vector c is unnecessary. Everywhere you have c[i] in your code you can replace it with i+2. This will make it run twice as fast.

Upvotes: 4

Related Questions