Reputation: 889
I'm running CentOS release 5.9 (Final) with CUDA, having a Tesla card with major version 1 and minor version 3. The following is my kernel code:
__global__ void foo(int* pos, int t)
{
int index = blockDim.x * blockIdx.x + threadIdx.x;
t = pos [index + 1] - pos [index];
char* temp = (char*)malloc(t);
}
I want to allocate t
bytes dynamically.
This gives me the error:
calling a host function("malloc") from a __device__/__global__ function("foo") is not allowed.
What can I do to solve this problem?
Upvotes: 1
Views: 975
Reputation: 72349
Because you are using a compute 1.3 device, kernel malloc
and the C++ new
operator are not supported (this is clearly explained in the CUDA C programming guide).
Your only alternatives are either to pre-allocate a scratch global memory area using host side memory allocation (which would need to be least the largest value of t * the number of threads launched on the GPU). This memory could either be passed as a command argument, or written onto a constant memory pointer which the kernel can read. You could also declare temp
as a statically sized local memory array in the kernel. If it turns out you need a relative small and a priori known values of max(t)
, then there may be performance benefits in templating the kernel pass max(t)
as a template parameter.
Upvotes: 2