gui
gui

Reputation: 425

Efficient 2D FFT of fixed length real input data in C/C++

I'm developing an algorithm that calls several times to a FFT function. I have several time constraints (real-time desired) so I need to minimize the time expended in every FFT call.

I'm working with OpenCV library and I have already implemented my code with two different approaches:

As my input data is always fixed as a real image of 512x512 pixels, do you think if I implement myself the FFT algorithm based in the mathematical definition of DFT, storing the sine/cosine tables can I achieve better performance or the FFTW library is really very optimized? Any better ideas?

All ideas and suggestions will be really appreciated. By now, I don't consider paralellization or GPU implementation.

Thank you

Update:

System: Intel Xeon 5130 2.0GHz CPU in Windows 7, Visual Studio 10.0 and FFTW 3.3.3 (compiled following instructions in the site), OpenCV 2.4.3.

Code example for FFT call with FFTW (input: OpenCV Mat CV_32F (1 channel, float type), output OpenCV Mat CV_32FC2 (2 channels, float type):

float           *im_data;

fftwf_complex    *data_in;
fftwf_complex    *fft;      

fftwf_plan       plan_f;

int             i, j, k;

int height=I.rows;
int width=I.cols;
int N=height*width;


float* outdata = new float[2*N];
im_data = ( float* ) I.data;

data_in = ( fftwf_complex* )fftwf_malloc( sizeof( fftwf_complex ) * N );
fft     = ( fftwf_complex* )fftwf_malloc( sizeof( fftwf_complex ) * N );

plan_f = fftwf_plan_dft_2d( height , width , data_in , fft ,  FFTW_FORWARD ,  FFTW_MEASURE );

for(int i = 0,k=0; i < height; ++i) {
    float* row = I.ptr<float>(i);
    for(int j = 0; j < width; j++) {
        data_in[k][0]=(float)row[j];
        data_in[k][1] =(float)0.0;
        k++;
    }
} 

fftwf_execute( plan_f );

int width2=2*width;
// writing output matrix: RealFFT[0],ImaginaryFFT[0],RealFFT[1],ImaginaryFFT[1],...
for( i = 0, k = 0 ; i < height ; i++ ) {
    for( j = 0 ; j < width2 ; j++ ) {

        outdata[i * width2 + j] = ( float )fft[k][0];
        outdata[i * width2 + j+1] = ( float )fft[k][1];
        j++;
        k++;
    }
}

Mat fft_I(height,width,CV_32FC2,outdata);

fftwf_destroy_plan( plan_f );
fftwf_free( data_in );
fftwf_free( fft );


return fft_I;

Upvotes: 4

Views: 2829

Answers (3)

Paul R
Paul R

Reputation: 212929

Your FFT time with FFTW seems very high. To get the best of out FFTW with fixed size FFTs you should generate a plan using the FFTW_PATIENT flag and then ideally save the generated "wisdom" for subsequent re-use. You can generate wisdom either from your own code or using the fftw-wisdom tool.

Upvotes: 3

Jason B
Jason B

Reputation: 12975

The FFT from the Intel Math Kernel Library (separate from the Intel compiler) is faster than FFTW most of the time. I don't know if it will be enough of an improvement in your case to justify the price though.

I will agree with the others that rolling your own FFT is probably not a good use of your time (unless you are wanting to learn how to do it). The available FFT implementations (FFTW, MKL) have been so finely tuned over many years. I'm not saying that you can't do better, but it would probably be a lot of work and time for marginal gains.

Upvotes: 1

kobra
kobra

Reputation: 1315

Believe me fftw is realy very optimized, there is very small chance, that you can do it better.

Which compiler you have used for compiling fftw? Sometimes compiler from Intel gives better perfomance than gcc

Upvotes: 0

Related Questions