AstrOne
AstrOne

Reputation: 3769

FFTW3, cuFFT and in-place transforms

I am trying to do a real-to-complex FFT transform using FFTW3. So far I've managed to do it using out-of-place transform, but I am having trouble implementing the in-place version of it. I was under the impression that the only things you have to change for the in-place transform are: 1) Make sure your data array has enough space to hold the complex part of the operation, 2) when you create the plan use the same address for the input and output data, 3) when you execute the plan use the same address for inout and output data. I have done all this stuff but I keep getting wrong results. I am doing a 2D FFT of a 2x2 array with values [[1,1],[1,1]]. The expected result (according to Matlab) is a 2x2 array with values [[4+0i, 0+0i], [0+0i, 0+0i]]. I get this result when I do the out-of-place transform. But when I do the in-place transform I get the following [[2+0i, 0+0i],[2+0i,0+0i]]. I chose a 2D FFT of 2x2 size because the length of the input and output data is the same and it helps for debugging. Here is my code:

bool inplace = true; // true for in-place, false for out-of-place
int dim_size[] = {2,2};
int N[] = {2,2};
int data_length     = N[0]*(N[1]);      //  2 * (2)     = 4
int data_fft_length = N[0]*(N[1]/2+1);  //  2 * (2/2+1) = 4
float* h_data_r = nullptr;              //  fftw data array
fftwf_complex* h_data_c = nullptr;      //  fftw data array (only used in out-of-place tranforms)

//  allocate fftw memory
if(inplace) {
    h_data_r = (float*)fftwf_malloc(data_fft_length*sizeof(fftwf_complex));
    h_data_c = (fftwf_complex*)h_data_r;
} else {
    h_data_r = (float*)fftwf_malloc(data_length*sizeof(float));
    h_data_c = (fftwf_complex*)fftwf_malloc(data_fft_length*sizeof(fftwf_complex));
}

//  create plane
unsigned int flags = FFTW_MEASURE;
fftwf_plan m_plan = fftwf_plan_dft_r2c_2d(N[0],N[1],h_data_r,h_data_c,flags);

//  initialize data array
h_data_r[0] = 1;
h_data_r[1] = 1;
h_data_r[2] = 1;
h_data_r[3] = 1;

//  execute fft plan
fftwf_execute(m_plan);

std::cout << "result:" << std::endl;
for(int i = 0; i < data_fft_length; ++i)
    std::cout << "[" << i << "]: " << h_data_c[i][0] << " " << h_data_c[i][1]  << std::endl;

The variable 'inplace' desides if the FFT transform is in-place or not. Could someone tell me what is wrong with my code? The code is as simple as it gets. I do not do anything special. I just want an in-place FFT real-to-complex transform. If you can't be bothered checking my code but you have a very simple in-place fft transform code with fftw3 please feel free to just copy-paste it.

Thank you.

EDIT 1: I simplified the code even more, now I use fftwf_plan_dft_r2c_2d() for plan creation and fftwf_execute() for plan execution. The problem still occurs.

EDIT 2: I translated the code to cufft which is supposed to have almost the same syntax with fftw3. I get the same problem with cufft. But by default cuFFT has FFTW compatibility mode enabled (CUFFT_COMPATIBILITY_FFTW_PADDING). If I disable the FFTW compatibility mode using the flag CUFFT_COMPATIBILITY_NATIVE then the in-place transform works just fine with cuFFT. The strange thing is that according to cuFFT documentation CUFFT_COMPATIBILITY_FFTW_PADDING is supposed to make a difference when you do batch trasnforms. In my case I do not do any batch transform. I am even more confused now.

Upvotes: 3

Views: 2011

Answers (4)

Raul Laasner
Raul Laasner

Reputation: 1585

In your code, where you define fftwf_plan, use this instead:

fftwf_plan m_plan;
if (inplace) {
    m_plan = fftwf_plan_many_dft_r2c(2, N, 1,
                                     h_data_r, N,
                                     1, 0,
                                     h_data_c, nullptr,
                                     1, 0,
                                     flags);
} else {
    m_plan = fftwf_plan_dft_r2c_2d(N[0],N[1],h_data_r,h_data_c,flags);
}

It took me a while to figure this out and it's still not clear to me why the basic interface doesn't work. In particular, read up on the onembed parameter in the FFTW manual. This needs to be null if you want to use the same compact format for the transformed array as with the basic interface (in this simple example, you get the same result with either Nor nullptr for onembed, but in general it matters).

Upvotes: 0

ngs111
ngs111

Reputation: 1

As @user4120016 pointed out, the input data needs to be padded.

The code provided above, which uses the basic interface, prints out:

[[4+0i, 0+0i], [0+0i, 0+0i]]

for case inplace = true if the data is padded:

h_data_r[0] = 1;
h_data_r[1] = 1;
h_data_r[2] = 0;
h_data_r[3] = 0;
h_data_r[4] = 1;
h_data_r[5] = 1;
h_data_r[6] = 0;
h_data_r[7] = 0;

Upvotes: 0

AstrOne
AstrOne

Reputation: 3769

Few months ago I managed to solve this issue. It turns out I had to use the advanced plan creation interface and set the values on inembed and onembed pointers manually.

Upvotes: 1

user4120016
user4120016

Reputation: 1

The fftw3 real to complex in-place transform requires padding at the end of each row of your data in the transform buffer (see: http://www.fftw.org/doc/Multi_002dDimensional-DFTs-of-Real-Data.html#Multi_002dDimensional-DFTs-of-Real-Data). The buffer you need for you 2x2 case will be of size 2 * (2(2/2+1)) = 2x4 (i.e. 2 rows with 4 floats in each row). When you fill this from your input remember that the padding is at the end of each row, not at the end of the buffer.

Upvotes: -1

Related Questions