Reputation: 555
I have been trying cuda recently and I had problems with the following cuda kernel.
__global__ void addKernel(float *c, const float *a, const float *b, int nsize)
{
int blockID = blockIdx.x + blockIdx.y*gridDim.x;
int i = blockID*blockDim.x+threadIdx.x;
if (i < nsize){
c[i] = a[i] + b[i];
}
float k = c[i];
}`
This kernel is used to do a simple vector addition. It would work fine without the last statement float k = c[i];
. But after I added this statement, I will receive unspecified launch failure
error when I run the code. Can anyone tell me what's wrong with this kernel?
Upvotes: 0
Views: 1130
Reputation: 151809
You really should show a complete code, to include the actual device memory allocations and the way you are launching this kernel (blocks, threads, etc). But very likely you are launching more than enough threads to cover the work size (i.e. vector length). That's a fairly common CUDA practice.
When you do that, it's customary to include a thread-check in your kernel:
if (i < nsize){
to make sure that the i
values that actually get used for indexing, are valid (i.e. within the vector length).
But then you've broken things by including this statement outside the thread-check (i.e. outside the body of the if-statement):
float k = c[i];
Now, for any computed i
in your kernel, an attempt will be made to index into the c
vector at that location, even if i
is greater than nsize
which is presumably the length of the c
vector.
So most likely this statement is indexing out-of-range for the c
vector allocation. You can confirm this with a bit more debugging, perhaps using a method such as what is described here.
Upvotes: 1