Allocate 3D Array for FFTW using fftw_malloc

Question

I am currently trying to improve the performance of my multithreaded FFTW implementation. In the documentation of fftw3 I read that for best-possible performance, the fftw_malloc function should be used to allocate in- and output data of the DFT.

Since I am dealing with large 3D arrays of size 256*256*256, I have to create them on the heap with

const unsigned int RES = 256;

std::complex(*V)[RES][RES];
V = new std::complex[RES][RES][RES];

And after initialization, I create multithreaded (in-place) fftw_plans for the 3D DFT transforms according to

int N_Threads = omp_get_max_threads();
fftw_init_threads();
fftw_plan_with_nthreads(N_Threads);

fftw_complex *input_V = reinterpret_cast(opr.V);
fftw_plan FORWARD_V = fftw_plan_dft_3d(RES, RES, RES, input_V, input_V, FFTW_FORWARD, FFTW_MEASURE);
fftw_plan BACKWARD_V = fftw_plan_dft_3d(RES, RES, RES, input_V, input_V, FFTW_BACKWARD, FFTW_MEASURE);

My question now is: How do I create these plans using fftw_malloc instead ?

In the fftw3 documentation I can only find

fftw_complex *in;
in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);

which I understand as a 1D example. Do I have to project my 3D array or is the use of fftw_malloc not possible/advisable in this case?

Blindy · Accepted Answer

malloc and its cousins (like your fftw_malloc) allocate single dimensional buffers, so in your case what you want is to create a buffer large enough to hold your three dimensional data:

fftw_malloc(sizeof(fftw_complex) * RES * RES * RES);

I read that for best-possible performance, the fftw_malloc function should be used

It's important to ask "why" whenever you see a statement like that. Specifically, non-aligned allocations incur a paging penalty, so this malloc variant is trying to allocate aligned memory. It's not magic, and you can definitely do that yourself as well, for example using aligned_alloc.

Allocate 3D Array for FFTW using fftw_malloc

Answers (1)

Related Questions