Reputation: 619
I want to allocate a small local array in a Numba CUDA kernel. However, I find that it does not allow parameterized array size. Only a constant size is allowed. How can I solve this?
import numba
# This works, but it has to hard code the array size
@cuda.jit
def kernel1():
arr = numba.cuda.local.array(3, dtype=numba.float32)
kernel1[2,2]()
# I want this, but it does not work
@cuda.jit
def kernel2(dim):
arr = numba.cuda.local.array(dim, dtype=numba.float32)
kernel2[2,2](3)
Below is the error message
TypingError: Failed in cuda mode pipeline (step: nopython frontend)
No implementation of function Function(<function local.array at 0x7f074e54dee0>) found for signature:
>>> array(int64, dtype=class(float32))
There are 2 candidate implementations:
- Of which 2 did not match due to:
Overload of function 'array': File: numba/cuda/cudadecl.py: Line 44.
With argument(s): '(int64, dtype=class(float32))':
No match.
During: resolving callee type: Function(<function local.array at 0x7f074e54dee0>)
During: typing of call at /tmp/ipykernel_18276/1701838372.py (3)
File "../../../../../tmp/ipykernel_18276/1701838372.py", line 3:
<source missing, REPL/exec in use?>
Upvotes: 1
Views: 690
Reputation: 72348
I find that it does not allow parameterized array size. Only a constant size is allowed. How can I solve this?
You can’t. As you say, only a constant size is allowed. This isn’t a Numba limitation, it is limitation of the CUDA programming model. Thread local memory is always statically allocated by the compiler.
There may be some meta-programming tricks you can try, analogous to C++ templates, but that will only leave you with multiple versions of the kernel with different statically compiled local array sizes, not true runtime dynamic allocation.
Upvotes: 3