lokains
lokains

Reputation: 127

openmp for loop is not parallelized

I'm trying to measure the running time of the parallel version and the serial one.

I have such a program:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <omp.h>
#include <time.h>
#include <unistd.h>
#include <sys/types.h>
#include <errno.h>
#include <sys/resource.h>
#include <sys/times.h>

#define ARRAY_SIZE 1024 * 20

long time_delta = 0;

struct rusage rusage_start;
struct rusage rusage_finish;

void bubble_sort(unsigned int* array) {
    unsigned int tmp = 0;
    bool no_swap = 0;
    for (unsigned int i = ARRAY_SIZE - 1; i >= 0; --i)
    {
        no_swap = 1;
        {
            #pragma omp parallel for num_threads(4)
            for (unsigned int j = 0; j < i; j++)
            {
                if (array[j] > array[j + 1])
                {
                    tmp = array[j];
                    array[j] = array[j + 1];
                    array[j + 1] = tmp;
                    no_swap = 0;
                }
            }
        }
        if (no_swap)
            break;
    }
}

int main(int argc, char* argv[]) {
    (void)argc;
    (void)argv;
    srand(time(NULL));
    unsigned int* array = malloc(sizeof(unsigned int) * ARRAY_SIZE);
    if(!array) { return -1; }
    for(unsigned int i = 0; i < ARRAY_SIZE; ++i) {
        array[i] = rand() % ARRAY_SIZE;
    }
    getrusage(RUSAGE_SELF, &rusage_start);
    
    //sort
    bubble_sort(array);

    getrusage(RUSAGE_SELF, &rusage_finish);
    time_delta = (1000000 * (rusage_finish.ru_utime.tv_sec - rusage_start.ru_utime.tv_sec)) + (rusage_finish.ru_utime.tv_usec - rusage_start.ru_utime.tv_usec);
    printf("Time: %li microseconds\n", time_delta);
    free(array);
    return 0;
}

I compile and measure time like this:

gcc -openmp main.c -o prog && for n in {1..10}; do ./prog; done

The problem is that, if I change the number of threads in the function before for or remove the directive altogether, nothing changes.

What am I doing wrong?

Everything seems to be correct. (I run the code on a VM with 4 cores; lscpu sees them.)

Upvotes: 1

Views: 130

Answers (1)

Adrian Mole
Adrian Mole

Reputation: 51825

Your for loop cannot (or should not) be parallelized because a variable declared outside its scope (tmp) is modified inside. In your case, this particular issue can be fixed by removing the current declaration you have for tmp and placing it inside the if block:

    if (array[j] > array[j + 1])
    {
        unsigned int tmp = array[j]; // Now a LOCAL variable (one for each loop).
        array[j] = array[j + 1];
        array[j + 1] = tmp;
        no_swap = 0;
    }

In your code as it stands, there is a (high) possibility of data races if loops executing in parallel try to simultaneously read or write the single tmp variable.

However, you also have a similar problem with the no_swap variable and, more importantly, with the array data itself: if one loop's j is the same as another loop's j + 1, then the attempt to swap the elements will cause a data race (access collision).

To summarize: parallelization of a bubble sort is not practical unless you create far more complex code. For an example, see here.

Upvotes: 1

Related Questions