Reputation: 194
my problem is CUDA memcpy copying back from device to host. My program uses GUI written in C# + CUDA wrapper class and core cuda logic written in cudaC.
This is the main code in c# responsible for starting everything:
int[] imgData = srcImg.RgbData8bitInt;
int[] patData = pattern.PatternData;
int[] maskData = pattern.MaskData;
int[] Accumulator = new int[srcImg.Width * srcImg.Height];
IntPtr A_dev = CUDA.MallocInt(srcImg.Width * srcImg.Height);
IntPtr Img_dev = CUDA.MallocInt(imgData.Length);
CUDA.MemcpyToDevice(imgData, Img_dev, imgData.Length);
IntPtr Pat_dev = CUDA.MallocInt(patData.Length);
CUDA.MemcpyToDevice(patData, Pat_dev, patData.Length);
IntPtr Mask_dev = CUDA.MallocInt(maskData.Length);
CUDA.MemcpyToDevice(maskData, Mask_dev, maskData.Length);
int gridSizeX = (srcImg.Width - pattern.Image.Width) / 256 + 1;
int gridSizeY = srcImg.Height - pattern.Image.Width;
int imageWidth = srcImg.Width;
CUDA.Execute(status, gridSizeX, gridSizeY, A_dev, Img_dev, Pat_dev, Mask_dev, imageWidth);
CUDA.SynchronizeContext();
CUDA.MemcpyToHost(Accumulator, A_dev, Accumulator.Length);
Btw. the CUDA.SynchronizeContext() is wrapper for cudaThreadSynchronize();
The problematic part is the last line, responsible for copying values from device back to host.
[DllImport(dllPath, CharSet = CharSet.Ansi, SetLastError = true, CallingConvention = CallingConvention.StdCall)]
private static extern int memcpyToHost(int[] srcPtr, IntPtr devPtr, int size);
extern "C" int __declspec(dllexport) __stdcall memcpyToHost(int* host, int* dev, int size)
{
if (dev == 0) return 1;
cudaError_t status = cudaMemcpy(host, dev, size * sizeof(int), cudaMemcpyDeviceToHost);
if (status == cudaSuccess)
return 0;
else
return 1;
}
The error status i'm getting while debugging is: cudaErrorInvalidValue
Allocating memory and copying to device seems ok, i've debugged. I'm completly at loss here, maybe someone encountered similar problem?
EDIT: SOLVED See comments
Upvotes: 0
Views: 688
Reputation: 194
The issue was cudaDeviceReset(); placed right after kernel call.
Upvotes: 1