OpenMP is not doing actual parallelism when using GCC

Question

I am running Ubuntu 12 on Virtual box and I am using GCC to compile this simple C program that has simple OpenMp pragmas :

#include 
#include 
#define MAX 10000000000
void main()
{
    unsigned long long i,j,k,l;
    int threadnumber;
    #pragma omp parallel shared(i,j,k,l)
    {
        threadnumber = omp_get_thread_num();
        if(threadnumber == 0)
        {
            for(i = 0; i < (MAX / 4); i++)
                ;
        }
        else if(threadnumber == 1)
        {
            for(j = (MAX / 4); j < (MAX / 2); j++)
                ;
        }
        else if(threadnumber == 2)
        {
            for(k = (MAX / 2); k < (3 * (MAX / 4));k++)
                ;
        }
        else
        {
            for(l = (3 * (MAX / 4)); l < MAX; l++)
                ;
        }
    }
}

My Processor is an Intel Core i5 one. The program is indeed working in parallel (verified through adding some printf()s) , I have set the environment variable(OMP_NUM_THREADS) to 4. The problem is that code is taking much time than this one which is not parallel :

#include 
#define MAX 10000000000
void main()
{
    unsigned long long i;
    for(i = 0; i < MAX; i++)
        ;
}

I have also tried to add clock() calls before and after the loop in both versions and I am getting a higher time in the parallel version. I have also tried to measure the time using : time ./a.out and I am getting (in the parallel version only) different "real" time than what is returned by clock() !

I have compiled both codes on visual studio and here are the results :

In Debug mode : both codes are given nearly equal times and that time is near to what is given by GCC.
In Release mode : Both codes are faster and the parallel one shows a great improvement in time. The Problem in a nutshell :
I want to run the program in parallel with that same efficiency as in the release version of visual studio's compiler. 2) Is there a parameter or option that I should pass to GCC other than the "-fopenmp" to make it build a release version exactly like visual studio. 3) I want to know if it is an Ubuntu thing problem or a GCC one or WHAT ???

P.S : I have tried running the same procedure on an Ubuntu with wubi installation and on an Ubuntu as a standalone OS (on an ext4 File System) and on the same platform and I am getting the same results.

Hristo Iliev · Accepted Answer

I really don't understand why every new OpenMP-related question here on SO contains code that uses clock() to (wrongly) measure wall-clock execution time, provided that OpenMP has a portable high resolution timer, available via a call to omp_get_wtime()?

First, using shared variables as loop counters inside a parallel region is a terrible terrible idea. Here is why, although you have a Nehalem or later microarchitecture based CPU which makes this less of a problem.

Second, Visual Studio applies different optimisation levels in Debug and Release configurations. In debug mode optimisation is disabled (/Od) while in release mode optimisation for speed is enabled (/O2). You say that in debug mode VS code runs as fast as the GCC code. This probably means that you run GCC with its default optimisation level of no optimisation. Compile with -O2 or even with -O3 to get code on par with what VS generates in release mode.

Third, you are running Ubuntu inside a virtual machine. How many CPUs does the virtual machine has access to?

Fourth, why are you reimplementing the OpenMP parallel for worksharing directive?

OpenMP is not doing actual parallelism when using GCC

Answers (2)

Related Questions