Robotex
Robotex

Reputation: 1026

OpenMP: parallel program not faster (or not very faster) then serial. What am I doing wrong?

Look at this code:

#include <stdio.h>
#include <omp.h>

int main()
{
    long i, j;

    #pragma omp for
    for(i=0;i<=100000;i++)
    {
        for(j=0;j<=100000;j++)
        {
            if((i ^ j) == 5687)
            {
                //printf("%ld ^ %ld\n", i, j);
                break;
            }
        }
    }
}

So, result:

robotex@robotex-work:~/Projects$ gcc test.c -fopenmp -o test_openmp
robotex@robotex-work:~/Projects$ gcc test.c -o test_noopenmp
robotex@robotex-work:~/Projects$ time ./test_openmp
real    0m11.785s
user    0m11.613s
sys 0m0.008s
robotex@robotex-work:~/Projects$ time ./test_noopenmp

real    0m13.364s
user    0m13.253s
sys 0m0.008s
robotex@robotex-work:~/Projects$ time ./test_noopenmp

real    0m11.955s
user    0m11.853s
sys 0m0.004s
robotex@robotex-work:~/Projects$ time ./test_openmp

real    0m15.048s
user    0m14.949s
sys 0m0.004s

What's wrong? Why are OpenMP program slower? How can I correct it?

I tested it in several computers (Intel Core i5 at work, Intel Core2Duo T7500 at home) with OS Ubuntu and always got the same result: OpenMP don't give significant performance gains.

I also tested example from Wikipedia and got the same result.

Upvotes: 1

Views: 4757

Answers (1)

Mysticial
Mysticial

Reputation: 471289

There are two issues in your code:

  1. You're missing the parallel in your pragma. So it's only using 1 thread.
  2. You have a race condition on j because it's declared outside the parallel region.

First, you need parallel to actually make OpenMP run in parallel:

#pragma omp parallel for

Secondly, you are declaring j outside the parallel region. This will make it shared among all the threads. So all the threads read and modify it inside the parallel region.

So not only do you have a race-condition, but the cache coherence traffic caused by all the invalidations is killing your performance.

What you need to do is to make j local to each thread. This can be done by either:

  1. Declaring j inside the parallel region.
  2. Or adding private(j) to the pragma: #pragma omp parallel for private(j)
    (as pointed out by @ArjunShankar in the comments)

Try this instead:

int main()
{
    double start = omp_get_wtime();

    long i;

#pragma omp parallel for
    for(i=0;i<=100000;i++)
    {
        long j;
        for(j=0;j<=100000;j++)
        {
            if((i ^ j) == 5687)
            {
                //printf("%ld ^ %ld\n", i, j);
                break;
            }
        }
    }

    double end = omp_get_wtime();

    printf("%f\n",end - start);
    return 0;
}

No OpenMP:            6.433378
OpenMP with global j: 9.634591
OpenMP with local j:  2.266667

Upvotes: 17

Related Questions