Why is this numba.cuda lookup table implementation failing?

Question

I'm trying to implement an transform which at some stage in it has a lookup table < 1K in size. This seems to me like it shouldn't pose a problem to a modern graphics card.

But the code below is failing with an unknown error:

from numba import cuda, vectorize
import numpy as np

tmp = np.random.uniform( 0, 100, 1000000 ).astype(np.int16)
tmp_device = cuda.to_device( tmp )

lut = np.arange(100).astype(np.float32) * 2.5
lut_device = cuda.to_device(lut)

@cuda.jit(device=True)
def lookup(x):
    return lut[x]

@vectorize("float32(int16)", target="cuda")
def test_lookup(x):
    return lookup(x)

test_lookup(tmp_device).copy_to_host() # <-- fails with cuMemAlloc returning UNKNOWN_CUDA_ERROR

What am I doing against the spirit of numba.cuda?

Even replacing lookup with the following simplified code results in the same error:

@cuda.jit(device=True)
def lookup(x):
    return x + lut[1]

Once this error occurs, I am essentially no longer able to utilize the cuda context at all. For instance, allocating a new array via cuda.to_device results in a:

numba.cuda.cudadrv.driver.CudaAPIError: [719] Call to cuMemAlloc results in UNKNOWN_CUDA_ERROR

Running on: 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04)

Driver Version: 390.25

numba: 0.33.0

MB. · Accepted Answer

The above code is fixed by modifying the part in bold:

@cuda.jit(device=True)
def lookup(x):
    lut_device = cuda.const.array_like(lut)
    return lut_device[x]

I ran multiple variations of the code including simply touching the lookup table from within this kernel, but not using its output. This combined with @talonmies' assertion that UNKNOWN_CUDA_ERROR usually occurs with invalid instructions, I thought that perhaps there was a shared memory constraint that was causing the issue.

The above code makes the whole thing work. However, I still don't understand why in a profound way.

If anyone knows and understands why, please feel free to contribute to this answer.

Why is this numba.cuda lookup table implementation failing?

Answers (1)

Related Questions