einpoklum
einpoklum

Reputation: 131547

What makes cuLaunchKernel fail with CUDA_ERROR_INVALID_HANDLE?

I'm launching a CUDA kernel I've compiled, using the cudLaunchKernel() driver API function. I'm passing my parameters in a kernelParams array, and passing nullptr for the extra argument.

Unfortunately, this fails, with the error: CUDA_ERROR_INVALID_HANDLE. Why? I checked the Driver API documentation to see how the function might fail in what cases, and edit it discusses the failure with CUDA_ERROR_INVALID_VALUE (not the same thing). It doesn't discuss the error I get.

Since there is more than one parameter to cuLaunchKernel() which is some sort of a handle - what does this failure mean? (And if there are multiple options - what are they?)

Upvotes: 3

Views: 3194

Answers (4)

Muraat
Muraat

Reputation: 1

I got same error, downgraded python to 3.8 and installed tensorflow again. It works now.

Upvotes: 0

Duterfresh
Duterfresh

Reputation: 1

cuobjdump -symbols myModule.cubin to check whether your function's name had been changed, if so, then add the extern "C" before your device function

Upvotes: -1

einpoklum
einpoklum

Reputation: 131547

One possibility is a failure due to a CUDA driver context switch. You may have inadvertently performed some action which pushes or replaces the current context for the CUDA device; and loaded modules are part of context - so your compiled and loaded kernel can no longer be loaded in the current context. This triggers a CUDA_ERROR_INVALID_HANDLE failure.

Assuming this is the case, switch the context before the launch, e.g. this way:

cuCtxPushCurrent(my_driver_context);
cuLaunchKernel(/*etc. etc. */);
/* possibly */ cuCtxPopCurrent(NULL);

or like so:

cuCtxSetCurrent(my_driver_context);
cuLaunchKernel(/*etc. etc. */);

Note that you may be risking memory leaks, if you pop and ignore the only reference to a valid context; and you may also risk some other code assuming that the context it has put in place is still the active one.

Upvotes: 3

Eypros
Eypros

Reputation: 5723

Well, in my case it was an OOM error (Out of Memory) error which for some reason was not reported as such. When I reduced the batch size of my model it worked. Maybe you should check if this is the case also.

Upvotes: -1

Related Questions