flyree
flyree

Reputation: 29

problems when creating many plans and executing plans

I am a little confused about creating many_plan by calling fftwf_plan_many_dft_r2c() and executing it with OpenMP. What I am trying to achieve here is to see if explicitly using OpenMP and organizing FFTW data could work together. ( I know I "should" use multithreaded version of fftw but I failed to get a expected speedup from it ).

My code looks like this:

/* I ignore some helper APIs */
#define N 1024*1024 //N is the total size of 1d fft 
fftwf_plan p;
float * in;
fftwf_complex *out;

omp_set_num_threads(threadNum); // Suppose threadNum is 2 here
in = fftwf_alloc_real(2*(N/2+1));
std::fill(in,in+2*(N/2+1),1.1f); // just try with a random real floating numbers
out = (fftwf_complex *)&in[0];  // for in-place transformation
/* Problems start from here */
int n[] = {N/threadNum}; // according to the manual, n is the size of each "howmany" transformation
p = fftwf_plan_many_dft_r2c(1, n, threadNum, in, NULL,1 ,1, out, NULL, 1, 1, FFTW_ESTIMATE);

#pragma omp parallel for
for (int i = 0; i < threadNum; i ++)
{
    fftwf_execute(p);
    // fftwf_execute_dft_r2c(p,in+i*N/threadNum,out+i*N/threadNum);
}

What I got is like this:

If I use fftwf_execute(p), the program executes successfully, but the result seems not correct. ( I compare the result with the version of not using many_plan and openmp )

If I use fftwf_execute_dft_r2c(), I got segmentation fault.

Can somebody help me here? How should I partition the data across multiple threads? Or it is not correct in the first place.

Thank you in advance.

flyree

Upvotes: 1

Views: 534

Answers (1)

tir38
tir38

Reputation: 10451

  • Do you properly allocate memory for out? Does this:
out = (fftwf_complex *)&in[0];  // for in-place transformation

do the same as this:

out = (fftw_complex*)fftw_malloc(sizeof(fftw_complex)*numberOfOutputColumns);
  • You are trying to access 'p' inside your parallel block, without specifically telling openMP how to use it. It should be:

pragma omp parallel for shared(p)

  • If you are going to split the work up for n threads, I would think you'd explicitly want to tell omp to use n threads:

pragma omp parallel for shared(p) num_threads(n)

  • Does this code work without multithreading? If you removed the for loop and openMP call and executed fftwf_execute(p) just once does it work?

  • I don't know much about FFTW's plans for many, but it seems like p is really many plans, not one single plan. So, when you "execute" p, you are executing all plans at once, right? You don't really need to iteratively execute p.

I'm still learning about OpenMP + FFTW so I could be wrong on these. StackOverflow doesn't like it when i put a # in front of pragma, but you need one.

Upvotes: 1

Related Questions