Reputation: 406
I'm trying to call a CUDA kernel from another kernel, but get the following error :
Traceback (most recent call last):
File "C:\temp\GPU Program Shell.py", line 22, in <module>
""")
File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 262, in __init__
arch, code, cache_dir, include_dirs)
File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 252, in compile
return compile_plain(source, options, keep, nvcc, cache_dir)
File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 134, in compile_plain
cmdline, stdout=stdout.decode("utf-8"), stderr=stderr.decode("utf-8"))
pycuda.driver.CompileError: nvcc compilation of c:\users\karste~1\appdata\local\temp\tmpgq8t45\kernel.cu failed
[command: nvcc --cubin -arch sm_35 -m64 -Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu]
[stderr:
kernel.cu(14): error: kernel launch from __device__ or __global__ functions requires separate compilation mode
My understanding is that this is has to do with Dynamic Parallelism and the other question related to this error is due to a user without approppriate hardware. I have a GTX Titan, however, so it should be compatible. What am I missing?
EDIT
After adding "options=['--cubin','-rdc=true' ,'-lcudart', '-lcudadevrt,','-Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu']" to SourceModule, I get the following error:
Traceback (most recent call last):
File "C:\temp\GPU Program Shell.py", line 22, in <module>
""", options=['--cubin','-rdc=true' ,'-lcudart', '-lcudadevrt,','-Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu'])
File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 265, in __init__
self.module = module_from_buffer(cubin)
pycuda._driver.LogicError: cuModuleLoadDataEx failed: not found -
Upvotes: 0
Views: 1927
Reputation: 151889
Python is compiling the CUDA code on the fly:
nvcc --cubin -arch sm_35 -m64 -Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu
In order to compile code containing dynamic parallelism, it's necessary to add specific switches to the compile command to enable separate compilation, device code linking, linking of the device runtime library, and the appropriate architecture target (sm_35
).
Some examples of valid nvcc
command combinations are given in the programming guide section on dynamic parallelism.
Your command line should look something like:
nvcc --cubin -arch=sm_35 -m64 -rdc=true -Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu -lcudadevrt
You may also wish to read the nvcc manual on separate compilation.
Upvotes: 5