Reputation: 266
I have computer with 4 cores and OMP application with 2 weighty tasks.
int main()
{
#pragma omp parallel sections
{
#pragma omp section
WeightyTask1();
#pragma omp section
WeightyTask2();
}
return 0;
}
Each task has such weighty part:
#omp pragma parallel for
for (int i = 0; i < N; i++)
{
...
}
I compiled program with -fopenmp
parameter, made export OMP_NUM_THREADS=4
.
The problem is that only two cores are loaded. How I can use all cores in my tasks?
Upvotes: 6
Views: 6012
Reputation: 392893
My initial reaction was: You have to declare more parallelism.
You have defined two tasks that can run in parallel. Any attempt by OpenMP to run it on more than two cores will slow you down (because of cache locality and possible false sharing).
Edit If the parallel for loops are of any significant volume (say, not under 8 iterations), and you are not seeing more than 2 cores used, look at
omp_set_nested()
the OMP_NESTED
=TRUE
|FALSE
environment variable
This environment variable enables or disables nested parallelism. The setting of this environment variable can be overridden by calling the
omp_set_nested()
runtime library function.If nested parallelism is disabled, nested parallel regions are serialized and run in the current thread.
In the current implementation, nested parallel regions are always serialized. As a result,
OMP_SET_NESTED
does not have any effect, andomp_get_nested()
always returns 0. If -qsmp=nested_par option is on (only in non-strict OMP mode), nested parallel regions may employ additional threads as available. However, no new team will be created to run nested parallel regions. The default value for OMP_NESTED is FALSE.
Upvotes: 5