Reputation: 1340
Is it possible to have Dynamic Parallelism feature in devices with compute capability less than 3.5?
I know that nvcc will return an error if it encounter a kernel launch in a kernel, but is it possible to declare some PTX-level function to support this feature?
For example here I have declared cudaLaunchDevice()
:
.extern .func(.param .b32 func_retval0) cudaLaunchDevice
{
.param .b32 fun,
.param .b32 parameterBuffer,
.param .align 4 .b8 gridDimension[12],
.param .align 4 .b8 blockDimension[12],
.param .b32 sharedMemSize,
.param .b32 stream
}
To hope that the CUDA-level declaration of this function in cuda_device_runtime_api.h
will be mapped to the previous PTX-level declaration. (Does the program have to be linked with something?)
//CUDA-level declaration of cudaLaunchDevice()
extern "C" __device__
cudaError_t cudaLaunchDevice(void *func, void *parameterBuffer,
dim3 gridDimension, dim3 blockDimension,
unsigned int sharedMemSize,
cudaStream_t stream);
(The codes are from CUDA C programming guide book v6.5) .
Upvotes: 1
Views: 641
Reputation: 359
Unfortunately that does not work. You'll have to have a Kepler GK110 or any first or second generation Maxwell card.
"Dynamic Parallelism is only supported by devices of compute capability 3.5 and higher." Source: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ixzz3OB61zgmK
Upvotes: 1