rozyang
rozyang

Reputation: 619

How to parameterize the size of cuda.local.array in Numba?

I want to allocate a small local array in a Numba CUDA kernel. However, I find that it does not allow parameterized array size. Only a constant size is allowed. How can I solve this?

import numba

# This works, but it has to hard code the array size
@cuda.jit
def kernel1():
    arr = numba.cuda.local.array(3, dtype=numba.float32)

kernel1[2,2]()


# I want this, but it does not work
@cuda.jit
def kernel2(dim):
    arr = numba.cuda.local.array(dim, dtype=numba.float32)

kernel2[2,2](3)

Below is the error message

TypingError: Failed in cuda mode pipeline (step: nopython frontend)
No implementation of function Function(<function local.array at 0x7f074e54dee0>) found for signature:
 
 >>> array(int64, dtype=class(float32))
 
There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload of function 'array': File: numba/cuda/cudadecl.py: Line 44.
    With argument(s): '(int64, dtype=class(float32))':
   No match.

During: resolving callee type: Function(<function local.array at 0x7f074e54dee0>)
During: typing of call at /tmp/ipykernel_18276/1701838372.py (3)


File "../../../../../tmp/ipykernel_18276/1701838372.py", line 3:
<source missing, REPL/exec in use?>

Upvotes: 1

Views: 690

Answers (1)

talonmies
talonmies

Reputation: 72348

I find that it does not allow parameterized array size. Only a constant size is allowed. How can I solve this?

You can’t. As you say, only a constant size is allowed. This isn’t a Numba limitation, it is limitation of the CUDA programming model. Thread local memory is always statically allocated by the compiler.

There may be some meta-programming tricks you can try, analogous to C++ templates, but that will only leave you with multiple versions of the kernel with different statically compiled local array sizes, not true runtime dynamic allocation.

Upvotes: 3

Related Questions