Reputation: 58
I have a template function where the template parameter is an integer. This integer is used to create different kernels. Previously all the possible templates where instantiated by hand in a table (works but ugly) but I tried to use the solution proposed here. Because I have more that 800 possible kernels the template recursion method is much more elegant. I have tested the template recursion on the C++ version of my code and it works perfectly, but nvcc seems to limit the recursion of my instantiation.
Here a simplified example of my previous ugly template instantiation list that works properly (even with the 800 kernel instantiations):
// the template kernel
template <int i> __global__ void kernel(int some_data)
{
switch(i)
{
case 0:
// do something
break;
case 1:
// do some other things
break;
//...
case 799:
// do some other things
break;
}
}
typedef void (*kernel_pointer) (int some_data)
// the ugly huge list
kernel_pointer kernel_list[800] = {
&kernel <0>,
&kernel <1>,
//...
&kernel <799> }
int main()
{
int kernel_index = 10;
//the call
kernel_pointer my_kernel = kernel_list[kernel_index];
my_kernel<<<<1,1>>>>(the_data);
}
Here the beautiful template recursion that nvcc doesn't like. It replace the list in the previous piece of code:
#define N_KERNELS 800
template< int i> bool dispatch_init( kernel_pointer* pTable )
{
pTable[i] = &kernel<i>;
return dispatch_init<i-1>( pTable );
}
// edge case of recursion
template<> bool dispatch_init<-1>(kernel_pointer* pTable) { return true; }
// call the recursive function
const bool initialized = dispatch_init<-1>( kernel_list );
In reality I don't have a single template parameter but 6 that combines to create all the hundreds combinations. Else, a switch with 800 cases would be really stupid. Does anyone have an idea to increase the nvcc template recursion limit or another automatic way to create my list?
Edit: I found the gcc option ftemplate-depth that changes the instantiation recursion limit but I haven't found an equivalent nvcc option.
Upvotes: 0
Views: 911
Reputation: 76597
In CUDA 12 at least (perhaps already in earlier versions), the nvcc compiler has the -ftemplate-depth
command line switch.
Upvotes: 0
Reputation: 58
From the idea proposed by Robert Crovella, which consists in building the table in few pieces, here is an example of how to fix the "error" :
#define N_KERNELS 850
// template kernel
template <int i> __global__ void kernel(int a)
{
switch(i)
{
case 0:
printf("%d\n", a*i);
break;
case 1:
printf("%d\n", a*i);
break;
//...
case 849:
printf("%d\n", a*i);
break;
}
}
typedef void (*kernel_pointer) (int);
kernel_pointer kernel_list[N_KERNELS];
// Function that instantiates all the needed kernels using recursion.
template< int i> bool dispatch_init( kernel_pointer* pTable )
{
pTable[i] = &kernel<i>;
return dispatch_init<i+1>( pTable );
}
// Edge cases of recursion made with a template specialization
template<> bool dispatch_init<199>(kernel_pointer* pTable)
{
pTable[199] = &kernel<199>;
return true;
}
template<> bool dispatch_init<399>(kernel_pointer* pTable)
{
pTable[399] = &kernel<399>;
return true;
}
template<> bool dispatch_init<599>(kernel_pointer* pTable)
{
pTable[599] = &kernel<599>;
return true;
}
template<> bool dispatch_init<799>(kernel_pointer* pTable)
{
pTable[799] = &kernel<799>;
return true;
}
template<> bool dispatch_init<N_KERNELS>(kernel_pointer* pTable) { return true; }
// Call the recursive function few times to instantiate all the kernels without reaching the recursive instantiation limit
const bool initialized = dispatch_init<0 >( kernel_list );
const bool initialized = dispatch_init<200>( kernel_list );
const bool initialized = dispatch_init<400>( kernel_list );
const bool initialized = dispatch_init<600>( kernel_list );
const bool initialized = dispatch_init<800>( kernel_list );
int main()
{
int kernel_index = 10;
kernel_pointer my_kernel = kernel_list[kernel_index];
my_kernel<<<<1,1>>>>(6);
}
I don't like this fix but it will do the trick for the moment. If a Nvidia developer comes here some day, it should be a good idea to add the option "ftemplate-depth" to nvcc, no?
Upvotes: 1