user1475838
user1475838

Reputation: 21

OpenMP - sections

I wrote an application using OpenMP. I created two sections and put into them two objects. Each of them calls a method which is running nearly 22-23 seconds. Both sections are independent.

When I set num_threads(1), the application takes 46 seconds to run. That's ok, because 2×23=46.

When I set num_threads(2), the application takes 35 seconds to run, but I was expecting ~25 seconds.

As I said, the sections are independent. cm1 and cm2 don't use any external variables. So, could anyone tell me why my app is 10 seconds slower than I expected? Is there any synchronization on low level?

t1 = clock();
#pragma omp parallel num_threads(2)
{
    #pragma omp sections
    {
        #pragma omp section
        {
            Cam cm1;
            cm1.solveUsingCost();
        }

        #pragma omp section
        {
            Cam cm2;
            cm2.solveUsingTime();
        }
    }
}
t2 = clock();

Upvotes: 2

Views: 666

Answers (2)

Pedro
Pedro

Reputation: 1384

Judging from your replies to the previous answer and comments, my guess is that your two functions, solveUsingCost() and solveUsingTime(), are quite memory-intensive or at least memory bandwidth limited.

What exactly are you computing? And how? What is, roughly, the ratio of arithmetic operations per memory access? How is your memory access patterned, e.g. do you run through a large array several times?

Upvotes: 0

Haatschii
Haatschii

Reputation: 9319

How many CPUs or cores do you have? If for example you have only 2 physical cores, one of them will also have to process all other programm + OS therefore this will slow down one of the threads.

Another possibility is that the L3 chache of your CPU is sufficent to save the data of one calculation at a time completly in L3 cache. But when doing 2 in paralell the double amount of memory is used and therefore maybe some memory from the L3 cache has to be transfered to the ram (Note that most Multicore CPUs share L3 cache between the cores). This will slow down your calculations a lot and could lead to the described results.

However these are only guesses, there could be a lot further reasons why there is not factor 2 speed gain, when doing your calculation in parallel.

Update: Of course what I forgot until you mentioned your CPU being an i5: i5 and i7 processors have this "Turbo boost" called ability to increase thier clock speed, in your case from 3.3 to 3.6 GHz. This however is only done when most cores are in idle (for thermal reasons I think) and a single core is boosted. Therefore two cores will not have double the speed of one core because they will run at lower clock speed.

Upvotes: 2

Related Questions