Ono
Ono

Reputation: 1357

cuFFT runs slowly - any way to accelerate?

I am using cufft to calculate 1D fft along each row for a matrix, and an array. The matrix size is 512 (x) X 720 (y), and the size of the array is 512 X 1. Which means the fft is applied on each row that has 512 elements for 720 times to the matrix, and is applied once for the array.

However, this operation turns out really slow, about one second basically. Is it normal, or any chance I can accelerate the code?

Here is my code (from NVIDIA sample code):

void FFTSinoKernel(cufftComplex* boneSinoF, 
                   cufftComplex* kernelF,
                   int nChanDetX,    // 512
                   int nView)        // 720
{
    cufftHandle plan;

    // fft sino
    cufftPlan1d(&plan, nChanDetX, CUFFT_C2C, nView);
    cufftExecC2C(plan, boneSinoF, boneSinoF, CUFFT_FORWARD);

    // fft kernel
    cufftPlan1d(&plan, nChanDetX, CUFFT_C2C, 1);
    cufftExecC2C(plan, kernelF, kernelF, CUFFT_FORWARD);

    cufftDestroy(plan);
}

I was trying to usecufftExecR2C(), but I think that function has bug, because my DC component shifts 1or 2 units with each row. So I have filed a but report. But for now the cufftExecC2C() gives me the right results, so I decide to stick to it.

UPDATE:

Interestingly, I found if I call this function again, it will accelerate significantly, less than 10 ms. So whenever the cufft gets called the first, time, it is slow. Afterwards, it becomes much faster. I don't understand why the first time is slow, and how to avoid it. Anyone has any similar experience? Thanks.

Upvotes: 2

Views: 1260

Answers (1)

hotpaw2
hotpaw2

Reputation: 70673

Move the FFT initialization (plan creation) outside of the performance critical loop. The setup code has to allocate memory and calculate O(N) transcendental functions, which can be much slower than the O(NlogN) simple arithmetic ops inside the FFT computation itself.

Upvotes: 3

Related Questions