Would merging openmp regions give a performance benefit?

Question

I have a parallel code which is purely MPI. MPI scales pretty well within 8 cores. However, due to memory requirement I would have to use hybrid code. My code has following structure

for( A Sequential loop for 10e5 iterations)
{
    highly_parallelizable_function_call_1()
    some_sequential_work
    highly_parallelizable_function_call_2()
    some_sequential_work
    MPI_send() 
    MPI_recv() 
    highly_parallelizable_function_call_3()
    highly_parallelizable_function_call_4()    

}

roughly function 3 and 4 accounts for 90% of the time. I changed function 3 and 4 to openmp parallel code. And profiling shows I get only speed up of 4-5 on this. Hence this code might not scale as good as MPI alone code. This I suspect can be due to threading overhead. To circumvent this I would like to change this code to to create thread only at the beginning, as follows

#pragma omp parallel
for( A Sequential loop for 10e5 iterations)
{
    parallel_version_function_call_1()

    if( thread_id==0) some_sequential_work 

    parallel_version_function_call_2()

    if( thread_id==0) some_sequential_work 
    if( thread_id==0) MPI_send() 
    if( thread_id==0) MPI_recv()

    parallel_version_function_call_3()
    parallel_version_function_call_4()    

}

Would doing something like this be beneficial ?

Would merging openmp regions give a performance benefit?

Answers (1)

Related Questions