Nikos Kazazakis
Nikos Kazazakis

Reputation: 792

Not getting the expected speedup using OpenMP on non-trivial calculations

I'm trying to learn OpenMP to parallelize a part of my code and I'm trying to figure out why it's not faster when using 2 threads instead of 1. Here's a minimal working example of the code:

#include <iostream>
#include <omp.h>

using namespace std;

class My_class
{
    public :

        // Constructor
        My_class(int nuIterations) 
            : prVar_(0),
              nuIters_(nuIterations)
        {} // Empty

        // Do something expensive involving the class' private vars
        void do_calculations()
        {
            for (int i=0;i<nuIters_;++i){
                prVar_=prVar_+i+2*i+3*i+4*i-5*i-4*i;
            }
        }

        // Retrieve result
        double getResult()
        {
            return prVar_;
        }

    private:

        double prVar_;
        int nuIters_;

};

int main()
{
    // Initialize one object for every thread
    My_class *test_object1, *test_object2;
    test_object1 = new My_class(1000000000);
    test_object2 = new My_class(500000000);

    // Set number of threads (use one line at a time)
    omp_set_num_threads(1); // One thread executes in 11.5 real seconds
    //omp_set_num_threads(2); // Two threads execute in 13.2 real seconds
    double start = omp_get_wtime(); // Start timer
#pragma omp parallel sections // Do calculations in parallel
    {
#pragma omp section
        {
            test_object1->do_calculations();
        }
#pragma omp section
        {
            test_object2->do_calculations();
        }
    }// End of parallel sections
    // Print results
    double end = omp_get_wtime();
    cout<<"Res 1 : "<<test_object1->getResult()<<endl;
    cout<<"Res 2 : "<<test_object2->getResult()<<endl;
    cout<<"Time  : "<<end-start<<endl;

    return 0;
}

Compiling and running this using g++ myomp.cpp -O0 -std=c++11 -fopenmp gives the following execution time for 1 and 2 threads:

  1. 1 thread : 11.5 seconds
  2. 2 threads: 13.2 seconds

Is there some way I can speed this up for 2 threads? I am running this on a 4-core Intel i7-4600U and Ubuntu.

EDIT: Changed most of the post such that it follows the guidlines.

Upvotes: 0

Views: 217

Answers (1)

Zulan
Zulan

Reputation: 22640

There are two effects here:

  1. Cache line contention: You have two very small objects that are allocated in dynamic memory. If they end up in the same cache line (usually 64 byte), the threads that want to update prVar_ will both compete for the level 1 cache, because they need exclusive (write) access. You should have observed this randomly: sometimes it is significantly faster / slower depending on the memory location. Try to print the pointer addresses and divide them by 64. To address this issue, you need to pad / align the memory.

  2. You have a huge load imbalance. One task is simply computing twice as much work, so even under idealized conditions, you will only achieve a speedup of 1.5.

Upvotes: 2

Related Questions