user3000477
user3000477

Reputation: 81

Multithreaded program outputs different results every time it runs

I have been trying to create a Multithreaded program that calculates the multiples of 3 and 5 from 1 to 999 but I can't seem to get it right every time I run it I get a different value I think it might have to do with the fact that I use a shared variable with 10 threads but I have no idea how to get around that. Also The program does work if I calculate the multiples of 3 and 5 from 1 to 9.

#include <stdlib.h>
#include <stdio.h>
#include <omp.h>
#include <string.h>

#define NUM_THREADS 10
#define MAX 1000

//finds multiples of 3 and 5 and sums up all of the multiples
int main(int argc, char ** argv)
{
    omp_set_num_threads(10);//set number of threads to be used in the parallel loop


    unsigned int NUMS[1000] = { 0 };
    int j = 0;

    #pragma omp parallel
    {

        int ID = omp_get_thread_num();//get thread ID

        int i;
        for(i = ID + 1;i < MAX; i+= NUM_THREADS)
        {
            if( i % 5 == 0 || i % 3 == 0)
            {
                NUMS[j++] = i;//Store Multiples of 3 and 5 in an array to sum up later  

            }   
        }


    }

    int i = 0;
    unsigned int total;
    for(i = 0; NUMS[i] != 0; i++)total += NUMS[i];//add up multiples of 3 and 5

    printf("Total : %d\n", total);
    return 0;
}

Upvotes: 0

Views: 2222

Answers (4)

Z boson
Z boson

Reputation: 33659

In your code j is a shared inductive variable. You can't rely on using shared inductive variables efficiently with multiple threads (using atomic every iteration is not efficient).

You could find a special solution not using inductive variables (for example using wheel factorization with seven spokes {0,3,5,6,9,10,12} out of 15) or you could find a general solution using private inductive variables like this

#pragma omp parallel
{
    int k = 0;
    unsigned int NUMS_local[MAX] = {0};
    #pragma omp for schedule(static) nowait reduction(+:total)
    for(i=0; i<MAX; i++) {
        if(i%5==0 || i%3==0) {
            NUMS_local[k++] = i;
            total += i;
        }
    }
    #pragma omp for schedule(static) ordered
    for(i=0; i<omp_get_num_threads(); i++) {
        #pragma omp ordered
        { 
            memcpy(&NUMS[j], NUMS_local, sizeof *NUMS *k);
            j += k;
        }
    }
}

This solution does not make optimal use of memory however. A better solution would use something like std::vector from C++ which you could implement for example using realloc in C but I'm not going to do that for you.

Edit:

Here is a special solution which does not use shared inductive variables using wheel factorization

int wheel[] = {0,3,5,6,9,10,12}; 
int n = MAX/15;
#pragma omp parallel for reduction(+:total)
for(int i=0; i<n; i++) {
    for(int k=0; k<7; k++) {
        NUMS[7*i + k] = 7*i + wheel[k];
        total += NUMS[7*i + k];
    }
}
//now clean up for MAX not a multiple of 15
int j = n*7;
for(int i=n*15; i<MAX; i++) {
    if(i%5==0 || i%3==0) {
        NUMS[j++] = i;
        total += i;
    }
}

Edit: It's possible to do this without a critical section (from the ordered clause). This does memcpy in parallel and also makes better use of memory at least for the shared array.

int *NUMS;
int *prefix;
int total=0, j;
#pragma omp parallel
{
    int i;
    int nthreads = omp_get_num_threads();
    int ithread  = omp_get_thread_num();
    #pragma omp single 
    {
        prefix = malloc(sizeof *prefix * (nthreads+1));
        prefix[0] = 0;
    }
    int k = 0;
    unsigned int NUMS_local[MAX] = {0};
    #pragma omp for schedule(static) nowait reduction(+:total)
    for(i=0; i<MAX; i++) {
        if(i%5==0 || i%3==0) {
            NUMS_local[k++] = i;
            total += i;
        }
    }
    prefix[ithread+1] = k;
    #pragma omp barrier
    #pragma omp single
    {
        for(i=1; i<nthreads+1; i++) prefix[i+1] += prefix[i];
        NUMS = malloc(sizeof *NUMS * prefix[nthreads]);
        j = prefix[nthreads];
    }
    memcpy(&NUMS[prefix[ithread]], NUMS_local, sizeof *NUMS *k);
}
free(prefix);

Upvotes: 1

Unavailable
Unavailable

Reputation: 681

This is a typical thread synchronization issue. All you need to do is using a kernel synchronization object for the sake of atomicity of any desired operation (incrementing the value of variable j in your case). It would be a mutex, semaphore or an event object depending on the operating system you're working on. But whatever your development environment is, to provide atomicity, the fundamental flow logic should be like the following pseudo-code:

{
    lock(kernel_object)
    // ...
    // do your critical operation (increment your variable j in your case)
    // ++j;
    // ...
    unlock(kernel_object)
}

If you're working on Windows operating system, there are some special synchronization mechanisms provided by the environment (i.e: InterlockedIncrement or CreateCriticalSection etc.) If you're working on a Unix/Linux based operating system, you can use mutex or semaphore kernel synchronization objects. Actually all those synchronization mechanism are stem from the concept of semaphores which is invented by Edsger W. Dijkstra in the begining of 1960's.

Here's some basic examples below:

Linux

#include <pthread.h>

pthread_mutex_t g_mutexObject = PTHREAD_MUTEX_INITIALIZER;

int main(int argc, char* argv[])
{

   // ...

   pthread_mutex_lock(&g_mutexObject);
   ++j;                                 // incrementing j atomically
   pthread_mutex_unlock(&g_mutexObject);

   // ...

   pthread_mutex_destroy(&g_mutexObject);

  // ...

   exit(EXIT_SUCCESS);
}

Windows

#include <Windows.h>

CRITICAL_SECTION g_csObject;

int main(void)
{
    // ...

    InitializeCriticalSection(&g_csObject);

    // ...

    EnterCriticalSection(&g_csObject);
    ++j;                                // incrementing j atomically       
    LeaveCriticalSection(&g_csObject);

    // ...

    DeleteCriticalSection(&g_csObject);

    // ...

    exit(EXIT_SUCCESS);
 }

or just simply:

#include <Windows.h>

LONG volatile g_j;  // our little j must be volatile in here now

int main(void)
{
    // ...

    InterlockedIncrement(&g_j);        // incrementing j atomically

    // ...

    exit(EXIT_SUCCESS);
}

Upvotes: 0

made1993
made1993

Reputation: 1

The problem you have is that threads doesn't necesarlly execute in order so the last thread to wirete may not have read the value in order so you overwrite wrong data.

There is a form to set that the threads in a loop, do a sumatory when they finish with the openmp options. You have to wirte somthing like this to use it.

#pragma omp parallel for reduction(+:sum)
for(k=0;k<num;k++)
{
    sum = sum + A[k]*B[k];
}
/* Fin del computo */
gettimeofday(&fin,NULL);

all you have to do is write the result in "sum", this is from an old code i have that do a sumatory.

The other option you have is the dirty one. Someway, make the threads wait and get in order using a call to the OS. This is easier than it looks. This will be a solution.

#pragma omp parallel
    for(i = ID + 1;i < MAX; i+= NUM_THREADS)
    {
        printf("asdasdasdasdasdasdasdas");
        if( i % 5 == 0 || i % 3 == 0)
        {
            NUMS[j++] = i;//Store Multiples of 3 and 5 in an array to sum up later  

        }   
    }

but i recommendo you to read fully the openmp options.

Upvotes: -1

Andrew Henle
Andrew Henle

Reputation: 1

"j++" is not an atomic operation.

It means "take the value contained at the storage location called j, use it in the current statement, add one to it, then store it back in the same location it came from".

(That's the simple answer. Optimization and whether or not the value is kept in a register can and will change things even more.)

When you have multiple threads doing that to the same variable all at the same time, you get different and unpredictable results.

You can use thread variables to get around that.

Upvotes: 2

Related Questions