user2188453
user2188453

Reputation: 1145

How to ensure the order of execution in multi-threaded programming?

Is there any problem within the following multi-threaded code? It always give me inconsistent results. It looks like the compiler optimization could move the flag setting line before the data processing line which causes a serious data race condition.

Is there any way to avoid this without adding a barrier?

#pragma omp parallel num_threads(16)

int tid=omp_get_thread_num();

if (tid<8)
{
   copydata(arrayofPtrs[tid]);

   flag[tid]=1;//flag is an array of volatile int where its initial values are all 0.

}
else
{
   for (int i=0; i<100000; ++i)
   {
     if (flag[tid-8]==1)
      {
       processingdata(arrayofPtrs[tid-8]);
       break;
       }
     else
       Sleep(200);
   };
};

Upvotes: 2

Views: 926

Answers (2)

Hristo Iliev
Hristo Iliev

Reputation: 74395

As far as I can understand your code, processing of data cannot continue unless the data has been copied, hence it makes no sense to do it in parallel - the processing threads will be wasting CPU time waiting for the copying threads to finish and set the flag. Why don't you then merge both operations in a single block:

#pragma omp parallel num_threads(8)
{
   int tid = omp_get_thread_num();

   copydata(arrayofPtrs[tid]);
   processingdata(arrayofPtrs[tid]);
}

If you still want to keep your original idea, probably if both copying and processing occur asynchronously and in a repeating fashion, then you need to synchronise the access to the flag using Open MP atomic operations:

#pragma omp parallel num_threads(16)
{
   int tid = omp_get_thread_num();

   if (tid < 8)
   {
      copydata(arrayofPtrs[tid]);

      #pragma omp atomic write
      flag[tid] = 1;//flag is an array of volatile int where its initial values are all 0.
   }
   else
   {
      for (int i = 0; i < 100000; ++i)
      {
         #pragma omp atomic read
         int ready = flag[tid-8];
         if (ready == 1)
         {
            processingdata(arrayofPtrs[tid-8]);
            break;
         }
         else
            Sleep(200);
      }
   }
}

With most compilers the atomic construct has the side effect that the referred variables become volatile. You can also explicitly update the memory view using flush:

#pragma omp atomic write
flag[tid] = 1;
#pragma omp flush(flag)

The read and write clauses are only supported on recent OpenMP versions. This Sleep() looks like you are using Win32 API, hence probably using MSVC, which does not support read and write modifiers since it only implements OpenMP 2.0, but still the code should compile and work as intended.

Another possible way to do it without busy loops is to use OpenMP locks. Initialise an array of locks, acquire them in the copying threads, then have each processing thread wait to acquire the lock:

omp_lock_t locks[8];
for (int i = 0; i < 8; i++)
   omp_init_lock(&locks[i]);

#pragma omp parallel num_threads(16)
{
    int tid = omp_get_thread_num();

    // Have the first 8 threads acquire the locks
    if (tid < 8)
       omp_set_lock(&locks[tid]);

    #pragma omp barrier

    // Now locks are set and processing can continue

    if (tid < 8)
    {
       copydata(arrayofPtrs[tid]);
       omp_unset_lock(&locks[tid]);
    }
    else
    {
       omp_set_lock(&locks[tid-8]);
       processingdata(arrayofPtrs[tid-8]);
       omp_unset_lock(&locks[tid-8]);
    }
 }

 for (int i = 0; i < 8; i++)
    omp_destroy_lock(&locks[i]);

You can also implement the same using Win32 events of POSIX semaphores instead of OpenMP locks. The advantage of this approach is that you don't need to explicitly loop while waiting for the flag to be set. Rather the omp_set_lock() call would block until the copying thread releases its lock. With Win32 events you can use WaitForSingleObject(hEvent, INFINITE); to wait for the copying thread to signal it.

Upvotes: 0

perreal
perreal

Reputation: 98018

You can use a loop around the flag test for the processing threads so that they will spin lock on the flag until it is set. However, that part of the code looks sequential, so why are you using multiple threads for copy/process? You can copy with a thread and go on to do processing that block with the same thread.

Upvotes: 1

Related Questions