Scott Abver
Scott Abver

Reputation: 3

openMP: call parallel function from parallel region

I'm trying to make my serial programm parallel with openMP. Here is the code where I have a big parallel region with a number of internal "#pragma omp for" sections. In serial version I have a function fftw_shift() which has "for" loops inside it too.

The question is how to rewrite the fftw_shift() function properly in order to already existed threads in the external parallel region could split "for" loops inside with no nested threads.

I'm not sure that my realisation works correctly. There is the way to inline the whole function in parallel region but I'm trying to realise how to deal with it in the described situation.

int fftw_shift(fftw_complex *pulse, fftw_complex *shift_buf, int 
array_size)
{
 int j = 0;              //counter
 if ((pulse != nullptr) || (shift_buf != nullptr)){
 if (omp_in_parallel()) {

 //shift the array
 #pragma omp for private(j) //shedule(dynamic)
 for (j = 0; j < array_size / 2; j++) {
  //left to right
  shift_buf[(array_size / 2) + j][REAL] = pulse[j][REAL]; //real
  shift_buf[(array_size / 2) + j][IMAG] = pulse[j][IMAG]; //imaginary

  //right to left
  shift_buf[j][REAL] = pulse[(array_size / 2) + j][REAL]; //real
  shift_buf[j][IMAG] = pulse[(array_size / 2) + j][IMAG]; //imaginary
 }
 //rewrite the array
 #pragma omp for private(j) //shedule(dynamic)
 for (j = 0; j < array_size; j++) {
  pulse[j][REAL] = shift_buf[j][REAL]; //real
  pulse[j][IMAG] = shift_buf[j][IMAG]; //imaginary
 }

 return 0;
 }
}

....
#pragma omp parallel firstprivate(x, phase) if(array_size >= 
OMP_THREASHOLD) 
{
 // First half-step
 #pragma omp for schedule(dynamic)
 for (x = 0; x < array_size; x++) {
  ..
 }

 // Forward FTW
 fftw_shift(pulse_x, shift_buf, array_size);
 #pragma omp master
 {
  fftw_execute(dft);
 }
 #pragma omp barrier
 fftw_shift(pulse_kx, shift_buf, array_size);
 ...
}

Upvotes: 0

Views: 855

Answers (1)

Zulan
Zulan

Reputation: 22660

If you call fftw_shift from a parallel region - but not a work-sharing construct (i.e. not in a parallel for), then you can just use omp for just as if you were inside a parallel region. This is called an orphaned directive.

However, your loops just copy data, so don't expect a perfect speedup depending on your system.

Upvotes: 1

Related Questions