Have GTX Titan, having trouble with Dynamic Parallelism

Question

I'm trying to call a CUDA kernel from another kernel, but get the following error :

Traceback (most recent call last):
  File "C:	emp\GPU Program Shell.py", line 22, in 
    """)
  File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 262, in __init__
    arch, code, cache_dir, include_dirs)
  File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 252, in compile
    return compile_plain(source, options, keep, nvcc, cache_dir)
  File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 134, in compile_plain
    cmdline, stdout=stdout.decode("utf-8"), stderr=stderr.decode("utf-8"))
pycuda.driver.CompileError: nvcc compilation of         c:\users\karste~1\appdata\local	emp	mpgq8t45\kernel.cu failed
[command: nvcc --cubin -arch sm_35 -m64 -Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu]
[stderr:
kernel.cu(14): error: kernel launch from __device__ or __global__ functions requires separate         compilation mode

My understanding is that this is has to do with Dynamic Parallelism and the other question related to this error is due to a user without approppriate hardware. I have a GTX Titan, however, so it should be compatible. What am I missing?

EDIT

After adding "options=['--cubin','-rdc=true' ,'-lcudart', '-lcudadevrt,','-Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu']" to SourceModule, I get the following error:

Traceback (most recent call last):
  File "C:	emp\GPU Program Shell.py", line 22, in 
""", options=['--cubin','-rdc=true' ,'-lcudart', '-lcudadevrt,','-Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu'])
  File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 265, in __init__
self.module = module_from_buffer(cubin)
pycuda._driver.LogicError: cuModuleLoadDataEx failed: not found -

Robert Crovella · Accepted Answer

Python is compiling the CUDA code on the fly:

nvcc --cubin -arch sm_35 -m64 -Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu

In order to compile code containing dynamic parallelism, it's necessary to add specific switches to the compile command to enable separate compilation, device code linking, linking of the device runtime library, and the appropriate architecture target (sm_35).

Some examples of valid nvcc command combinations are given in the programming guide section on dynamic parallelism.

Your command line should look something like:

nvcc --cubin -arch=sm_35 -m64 -rdc=true -Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu -lcudadevrt

You may also wish to read the nvcc manual on separate compilation.

Have GTX Titan, having trouble with Dynamic Parallelism

Answers (1)

Related Questions