Matan Marciano
Matan Marciano

Reputation: 333

CUDA fft different results from MATLAB fft

I have tried to do a simple fft and compare the results between MATLAB and CUDA.

MATLAB: Vector of 9 numbers 1-9

I = [1 2 3 4 5 6 7 8 9];

and use this code:

fft(I)

gives the results:

  45.0000 + 0.0000i
  -4.5000 +12.3636i
  -4.5000 + 5.3629i
  -4.5000 + 2.5981i
  -4.5000 + 0.7935i
  -4.5000 - 0.7935i
  -4.5000 - 2.5981i
  -4.5000 - 5.3629i
  -4.5000 -12.3636i

And CUDA code:

int FFT_Test_Function() {

    int n = 9;

    double* in = new double[n];
    Complex* out = new Complex[n];

    for (int i = 0; i<n; i++)
    {
        in[i] = i + 1;
    }

    // Allocate the buffer
    cufftDoubleReal *d_in;
    cufftDoubleComplex *d_out;
    unsigned int out_mem_size = sizeof(cufftDoubleComplex)*n;
    unsigned int in_mem_size = sizeof(cufftDoubleReal)*n;
    cudaMalloc((void **)&d_in, in_mem_size);
    cudaMalloc((void **)&d_out, out_mem_size);

    // Save time stamp
    milliseconds timeStart = getCurrentTimeStamp();

    cufftHandle plan;
    cufftResult res = cufftPlan1d(&plan, n, CUFFT_D2Z, 1);
    if (res != CUFFT_SUCCESS) { cout << "cufft plan error: " << res << endl; return 1; }
    cudaCheckErrors("cuda malloc fail");

    cudaMemcpy(d_in, in, in_mem_size, cudaMemcpyHostToDevice);
    cudaCheckErrors("cuda memcpy H2D fail");

    res = cufftExecD2Z(plan, d_in, d_out);
    if (res != CUFFT_SUCCESS) { cout << "cufft exec error: " << res << endl; return 1; }
    cudaMemcpy(out, d_out, out_mem_size, cudaMemcpyDeviceToHost);
    cudaCheckErrors("cuda memcpy D2H fail");

    milliseconds timeEnd = getCurrentTimeStamp();
    milliseconds totalTime = timeEnd - timeStart;
    std::cout << "Total time: " << totalTime.count() << std::endl;

    return 0;
}

In this CUDA code i got the result:

enter image description here

You can see that CUDA gives 4 zero's (cells 5-9).

What am i missed?

Thank you very much for your attention!

Upvotes: 1

Views: 517

Answers (1)

Paul R
Paul R

Reputation: 212929

CUFFT_D2Z is a real-to-complex FFT, so the top N/2 - 1 points in the output data are redundant - they are just the complex conjugate of the bottom half of the transform (you can see this in the MATLAB output if you compare pairs of terms which are mirrored about the mid-point).

You can fill in these "missing" terms if you need them, by just taking the complex conjugate of each corresponding term, but usually there isn't much point in doing this.

Upvotes: 3

Related Questions