Reputation: 885
I'm trying to implement FFT transformation of signal with CUDA in C#. I'm using managedCUDA library for that. I've implemented it, but it looks horrible and it is not efficient at all.
Here is my working code:
public static float[] doFFT(float[] data)
{
CudaContext cntxt = new CudaContext();
CudaDeviceVariable<float2> devData = new CudaDeviceVariable<float2>(data.Length);
float2[] fData = new float2[data.Length];
for (int i = 0; i < data.Length; i++)
{
fData[i].x = data[i];
}
devData.CopyToDevice(fData);
CudaFFTPlanMany fftPlan = new CudaFFTPlanMany(1, new int[] { data.Length }, 1, cufftType.R2C);
fftPlan.Exec(devData.DevicePointer, TransformDirection.Forward);
float2[] result = new float2[data.Length];
devData.CopyToHost(result);
fftPlan.Dispose();
float[][] res = new float[2][];
res[0] = new float[data.Length];
res[1] = new float[data.Length];
for (int i = 0; i < data.Length; i++)
{
res[0][i] = result[i].x;
res[1][i] = result[i].y;
}
return res[0];
}
How can I avoid of manual copying of my signal data into float2 array? And how can I avoid of reverse copying of complex data into single-dimension float array (just real part of complex number)?
I've tried for example copying there:
CudaDeviceVariable<float> devData = data;
as I found somewhere, but I'm getting wrong results :-/
Thanks a lot!
Juraj
Upvotes: 0
Views: 1327
Reputation: 1024
You need to pad the input array to the right size if you're doing R2C inplace FFTs. See also the CUFFT documentation, chapter 2.4. Data Layout, page 6. Conversion on host side is not necessary. The following example code should point you into the right direction, whereas it is not complete, e.g. no cleanup at the end...
//Context creation not necessary if CudaFFTPlan is the first Cuda related call
//Cufft creates a new context if none exists and we can use it implicitly afterwards
//CudaContext ctx = new CudaContext();
float[] h_dataIn = new float[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 };
//Caution: Array sizes matter! See CUFFFT-Documentation...
int size_real = h_dataIn.Length;
int size_complex = (int)Math.Floor(size_real / 2.0) + 1;
CudaFFTPlanMany fftPlan = new CudaFFTPlanMany(1, new int[] { size_real }, 1, cufftType.R2C);
//Size of d_data must be padded for inplace R2C transforms: size_complex * 2 and not size_real
CudaDeviceVariable<float> d_data = new CudaDeviceVariable<float>(size_complex * 2);
//device allocation and host have different sizes, why the amount of data must be given explicitly for copying:
d_data.CopyToDevice(h_dataIn, 0, 0, size_real * sizeof(float));
//executa plan
fftPlan.Exec(d_data.DevicePointer, TransformDirection.Forward);
//Output to host, either as float2 or float, but array sizes must be right!
float2[] h_dataOut = new float2[size_complex];
float[] h_dataOut2 = new float[size_complex * 2];
d_data.CopyToHost(h_dataOut);
d_data.CopyToHost(h_dataOut2);
Upvotes: 3