Reputation: 5106
The problem
I'm trying to copy an int
array into the device's constant memory, but I keep getting the following error:
[ERROR] 'invalid argument' (11) in 'main.cu' at line '386'
The code
There's a lot of code developed, so I'm going to simplify what I have.
I've declared a device __constant__
variable at the top section of my main.cu file, outside any function.
__device__ __constant__ int* dic;
I also have a host variable, flatDic
, that's malloc'ed the following way, inside main()
:
int* flatDic = (int *)malloc(num_codewords*(bSizeY*bSizeX)*sizeof(int));
Then I try to copy the contents of flatDic
into dic
by doing so, also in main()
:
cudaMemcpyToSymbol(dic, flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int));
This cudaMemcpyToSymbol()
call it's line 386 of main.cu, and it's where the aforementioned error is thrown.
What I've tried
Here's what I've tried so far to solve the problem:
I've tried the all of the following, returning always the same error:
cudaMemcpyToSymbol(dic, &flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int));
cudaMemcpyToSymbol(dic, flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int));
cudaMemcpyToSymbol(dic, &flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int), 0, cudaMemcpyHostToDevice);
cudaMemcpyToSymbol(dic, flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int), 0, cudaMemcpyHostToDevice);
I've also tried to cudaMalloc()
the dic
variable, before calling cudaMemcpyToSymbol()
. No errors are thrown in cudaMalloc()
, but cudaMemcpyToSymbol()
error persists.
cudaMalloc((void **) &dic, num_codewords*(bSizeY*bSizeX)*sizeof(int));
I've also search extensively thorough the web, documentation, forums, examples, etc, all to no avail.
Does anyone see anything wrong with my code? Thanks in advance.
Upvotes: 5
Views: 7512
Reputation: 5287
cudaMemcpyToSymbol
copies to a constant variable, here you're trying to copy multiple bytes of type int
(an allocated ARRAY) to a pointer of type int *
. These types are not the same, hence the invalid type
. To make this work, you will need to copy an ARRAY of int
(allocated) to the device (static length) ARRAY of int
(constant), e.g.:
__device__ __constant__ int dic[LEN];
Example from the CUDA C Programming Guide (which I suggest you read -- it's quite good!):
__constant__ float constData[256];
float data[256];
cudaMemcpyToSymbol(constData, data, sizeof(data));
cudaMemcpyFromSymbol(data, constData, sizeof(data));
To my knowledge you could also cudaMemcpyToSymbol
a pointer to a pointer (unlike your example, where you're copying an array to a pointer), but beware only that pointer will be constant, not the memory it's pointing to on your device. If you were going to go this route, you would need to add a cudaMalloc
, then cudaMemcpyToSymbol
the resulting ptr to device memory to your __constant__
device var. AGAIN, in this case the array values WILL NOT be constant -- ONLY the pointer to the memory will be.
Your call for this case would be something like:
int * d_dic;
cudaMalloc((void **) &d_dic, num_codewords*(bSizeY*bSizeX)*sizeof(int));
cudaMemcpyToSymbol(c_dic_ptr, &d_Dic, sizeof(int *));
Also you should be wrapping your CUDA calls during debugging inside error checking logic. I've borrowed the following logic from talonmies:
__inline __host__ void gpuAssert(cudaError_t code, char *file, int line,
bool abort=true)
{
if (code != cudaSuccess)
{
fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code),
file, line);
if (abort) exit(code);
}
}
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
To call simply wrap your CUDA call in it like so:
gpuErrchk(cudaMemcpyToSymbol(dic, flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int)));
The programming will exit with an error message if you're having allocation issues or other common errors.
To check your kernel, do something like:
MyKernel<<<BLK,THRD>>>(vars...);
//Make sure nothing went wrong.
gpuErrchk(cudaPeekAtLastError());
gpuErrchk(cudaDeviceSynchronize());
Thanks to talonmies for the error checking code!
Note:
Even if you were doing a vanilla cudaMemcpy
, your code would fail as you haven't cudaMalloc
ed memory for your array -- int that case, though, the failure would likely be the GPU equivalent of a segfault (likely Unspecified launch failure
) as the pointer would have some sort of junk value in it and you would be trying to write the memory with the address given by that junk value.
Upvotes: 6