Sushant Yelpale
Sushant Yelpale

Reputation: 889

NVCC rejects my use of malloc() in kernel code

I'm running CentOS release 5.9 (Final) with CUDA, having a Tesla card with major version 1 and minor version 3. The following is my kernel code:

__global__ void foo(int* pos, int t)
{
    int index = blockDim.x * blockIdx.x + threadIdx.x; 
    t = pos [index + 1] - pos [index];  
    char* temp = (char*)malloc(t);
}

I want to allocate t bytes dynamically.

This gives me the error:

calling a host function("malloc") from a __device__/__global__ function("foo") is not allowed.

What can I do to solve this problem?

Upvotes: 1

Views: 975

Answers (1)

talonmies
talonmies

Reputation: 72349

Because you are using a compute 1.3 device, kernel malloc and the C++ new operator are not supported (this is clearly explained in the CUDA C programming guide).

Your only alternatives are either to pre-allocate a scratch global memory area using host side memory allocation (which would need to be least the largest value of t * the number of threads launched on the GPU). This memory could either be passed as a command argument, or written onto a constant memory pointer which the kernel can read. You could also declare temp as a statically sized local memory array in the kernel. If it turns out you need a relative small and a priori known values of max(t), then there may be performance benefits in templating the kernel pass max(t) as a template parameter.

Upvotes: 2

Related Questions