Reputation: 1145
Is there any problem within the following multi-threaded code? It always give me inconsistent results. It looks like the compiler optimization could move the flag setting line before the data processing line which causes a serious data race condition.
Is there any way to avoid this without adding a barrier?
#pragma omp parallel num_threads(16)
int tid=omp_get_thread_num();
if (tid<8)
{
copydata(arrayofPtrs[tid]);
flag[tid]=1;//flag is an array of volatile int where its initial values are all 0.
}
else
{
for (int i=0; i<100000; ++i)
{
if (flag[tid-8]==1)
{
processingdata(arrayofPtrs[tid-8]);
break;
}
else
Sleep(200);
};
};
Upvotes: 2
Views: 926
Reputation: 74395
As far as I can understand your code, processing of data cannot continue unless the data has been copied, hence it makes no sense to do it in parallel - the processing threads will be wasting CPU time waiting for the copying threads to finish and set the flag. Why don't you then merge both operations in a single block:
#pragma omp parallel num_threads(8)
{
int tid = omp_get_thread_num();
copydata(arrayofPtrs[tid]);
processingdata(arrayofPtrs[tid]);
}
If you still want to keep your original idea, probably if both copying and processing occur asynchronously and in a repeating fashion, then you need to synchronise the access to the flag using Open MP atomic
operations:
#pragma omp parallel num_threads(16)
{
int tid = omp_get_thread_num();
if (tid < 8)
{
copydata(arrayofPtrs[tid]);
#pragma omp atomic write
flag[tid] = 1;//flag is an array of volatile int where its initial values are all 0.
}
else
{
for (int i = 0; i < 100000; ++i)
{
#pragma omp atomic read
int ready = flag[tid-8];
if (ready == 1)
{
processingdata(arrayofPtrs[tid-8]);
break;
}
else
Sleep(200);
}
}
}
With most compilers the atomic
construct has the side effect that the referred variables become volatile. You can also explicitly update the memory view using flush
:
#pragma omp atomic write
flag[tid] = 1;
#pragma omp flush(flag)
The read
and write
clauses are only supported on recent OpenMP versions. This Sleep()
looks like you are using Win32 API, hence probably using MSVC, which does not support read
and write
modifiers since it only implements OpenMP 2.0, but still the code should compile and work as intended.
Another possible way to do it without busy loops is to use OpenMP locks. Initialise an array of locks, acquire them in the copying threads, then have each processing thread wait to acquire the lock:
omp_lock_t locks[8];
for (int i = 0; i < 8; i++)
omp_init_lock(&locks[i]);
#pragma omp parallel num_threads(16)
{
int tid = omp_get_thread_num();
// Have the first 8 threads acquire the locks
if (tid < 8)
omp_set_lock(&locks[tid]);
#pragma omp barrier
// Now locks are set and processing can continue
if (tid < 8)
{
copydata(arrayofPtrs[tid]);
omp_unset_lock(&locks[tid]);
}
else
{
omp_set_lock(&locks[tid-8]);
processingdata(arrayofPtrs[tid-8]);
omp_unset_lock(&locks[tid-8]);
}
}
for (int i = 0; i < 8; i++)
omp_destroy_lock(&locks[i]);
You can also implement the same using Win32 events of POSIX semaphores instead of OpenMP locks. The advantage of this approach is that you don't need to explicitly loop while waiting for the flag to be set. Rather the omp_set_lock()
call would block until the copying thread releases its lock. With Win32 events you can use WaitForSingleObject(hEvent, INFINITE);
to wait for the copying thread to signal it.
Upvotes: 0
Reputation: 98018
You can use a loop around the flag test for the processing threads so that they will spin lock on the flag until it is set. However, that part of the code looks sequential, so why are you using multiple threads for copy/process? You can copy with a thread and go on to do processing that block with the same thread.
Upvotes: 1