Mattia Baldari
Mattia Baldari

Reputation: 617

Single computation of a variable that has to be visible for every thread

I am computing a project with openMP. In this project i need to do a computation, in particular:

gap = (DEFAULT_HEIGHT / nthreads);

where DEFAULT_HEIGHT is a constant and nthreads is the number of threads inside my parallel region. My problem is that i can't compute the variable gap outside the parallel region because i need to be inside to know nthreads . But on the other hand i don't want to compute gap for every thread. Moreover i can't set a code like this:

if(tid==0){
    gap = (DEFAULT_HEIGHT / nthreads);
}

because i don't know the order of esecution of every thread, so it could be that the thread 0 start for last and all my other computation that need gap will be wrong ( because it will be not setted ). So, there is a way to make this computation only once without this problem?

Thanks

Upvotes: 2

Views: 93

Answers (2)

High Performance Mark
High Performance Mark

Reputation: 78316

Ensure that gap is a shared variable and enclose it in an OpenMP single directive, something like

#pragma omp single
{ 
    gap = (DEFAULT_HEIGHT / nthreads);
}

Only one thread will execute the code enclosed in the single directive, the other threads will wait at the end of the enclosed block of code.

An alternative would be to make gap private and let all threads compute their own value. This might be faster, the single option requires some synchronisation which always takes time. If you are concerned, try both and compare results. ( I think this is what ComicSansMS is suggesting.)

Upvotes: 6

ComicSansMS
ComicSansMS

Reputation: 54589

Here's the tradeoff: If only one thread does the computation you have to synchronize access to that value. In particular also threads that only want to read the value do have to synchronize, as they usually have no other means of determining whether the write is already finished. If initialization of the variable is so expensive that it can compensate for this, you should go for it. But it's probably not.

Keep in mind that you can do a lot of computation on the CPU in the time it takes to fetch data from memory. Synchronizing this access properly will eat away additional cycles and can lead to undesired stalling effects. Even worse, the impact of such effects usually increases drastically with the number of threads sharing the resource.

It's not uncommon to accept some redundancy in parallel computation as the synchronization overhead easily nullifies any benefits from saved computation time for redundant data.

Upvotes: 3

Related Questions