Reputation: 23
Try to find solution for this: Cuda Kernel uses several device functions, some of them need to return array.
I try to do this:
__device__ float *MatProd2dWxC(float *a2d, float *b2d, int mGl, int nGl)
{
int aRows = mGl; int aCols = nGl;
int bRows = nGl; int bCols = 1;
float *result;
//result.resize(mGl*aRows);
for (int i = 0; i < aRows; ++i) // each row of a
for (int j = 0; j < bCols; ++j) // each col of b
for (int k = 0; k < aCols; ++k)
result[i*mGl + j] += a2d[i*mGl + k] * b2d[k*mGl + j];
return result;
}
Don't compile this because understand that pointer in function name is not good idea. But how to do right, as idea to include additional temp array and change function to void. but then I need to use it many times in kernel code, look for more elegant solution.
Upvotes: 1
Views: 2715
Reputation: 7265
Returning a pointer from a device function is ok and works just fine.
The problem in your code is that you do not assign any value to the result
pointer which you then dereference and also return from the function later. You need to use float *result = malloc(mGl*aRows * sizeof(float));
to allocate memory (and don't forget to free()
later!).
However a better design would be to pass an already allocated pointer into your device function. This establishes clear ownership of the allocation (i.e. makes it clear in your code where free()
should be called), and may avoid unnecessary allocation in some cases, where e.g. the allocation could be pulled outside of a loop.
This problem has nothing to do with CUDA, it applies as well to standard C.
Upvotes: 2