Reputation: 1340
When using Dynamic Parallelism in CUDA, you can implement recursive algorithms like mergeSort. I have implemented it and my program don't work for inputs greater than blah.
My question is how many depth in the recursion tree the implementation can go? Is there any limitation? (My program is just fine for smaller inputs.)
Upvotes: 1
Views: 1844
Reputation: 5161
From Professional CUDA C Programming:
The maximum nesting depth of dynamic parallelism is limited to 24, but in reality most kernels will be limited by the amount of memory required by the device runtime system at each new level . . .
Upvotes: 4