Template excessive recursion at instantiation cuda

Question

I have a template function where the template parameter is an integer. This integer is used to create different kernels. Previously all the possible templates where instantiated by hand in a table (works but ugly) but I tried to use the solution proposed here. Because I have more that 800 possible kernels the template recursion method is much more elegant. I have tested the template recursion on the C++ version of my code and it works perfectly, but nvcc seems to limit the recursion of my instantiation.

Here a simplified example of my previous ugly template instantiation list that works properly (even with the 800 kernel instantiations):

// the template kernel 
template  __global__ void kernel(int some_data)
{
    switch(i)
    {
    case 0:
        // do something
        break;
    case 1:
        // do some other things
        break;
    //...
    case 799:
        // do some other things
        break;
    }
}

typedef void (*kernel_pointer) (int some_data)

// the ugly huge list
kernel_pointer kernel_list[800] = {
    &kernel <0>,
    &kernel <1>,
    //...
    &kernel <799> }

int main()
{
    int kernel_index = 10;

    //the call
    kernel_pointer my_kernel = kernel_list[kernel_index];
    my_kernel<<<<1,1>>>>(the_data);        
}

Here the beautiful template recursion that nvcc doesn't like. It replace the list in the previous piece of code:

#define N_KERNELS 800
template< int i> bool dispatch_init( kernel_pointer* pTable )
{
    pTable[i] = &kernel;    
    return dispatch_init( pTable );
}    
// edge case of recursion
template<> bool dispatch_init<-1>(kernel_pointer* pTable) { return true; }

// call the recursive function
const bool initialized = dispatch_init<-1>( kernel_list );

In reality I don't have a single template parameter but 6 that combines to create all the hundreds combinations. Else, a switch with 800 cases would be really stupid. Does anyone have an idea to increase the nvcc template recursion limit or another automatic way to create my list?

Edit: I found the gcc option ftemplate-depth that changes the instantiation recursion limit but I haven't found an equivalent nvcc option.

Awen · Accepted Answer

From the idea proposed by Robert Crovella, which consists in building the table in few pieces, here is an example of how to fix the "error" :

#define N_KERNELS 850
// template kernel 
template  __global__ void kernel(int a)
{
    switch(i)
    {
    case 0:
        printf("%d
", a*i);
        break;
    case 1:
        printf("%d
", a*i);
        break;
    //...
    case 849:
        printf("%d
", a*i);
        break;
    }
}

typedef void (*kernel_pointer) (int);

kernel_pointer kernel_list[N_KERNELS];

// Function that instantiates all the needed kernels using recursion.
template< int i> bool dispatch_init( kernel_pointer* pTable )
{
    pTable[i] = &kernel;    
    return dispatch_init( pTable );
}    

// Edge cases of recursion made with a template specialization
template<> bool dispatch_init<199>(kernel_pointer* pTable)
{
    pTable[199] = &kernel<199>;
    return true;
}
template<> bool dispatch_init<399>(kernel_pointer* pTable) 
{
    pTable[399] = &kernel<399>;
    return true;
}
template<> bool dispatch_init<599>(kernel_pointer* pTable)
{
    pTable[599] = &kernel<599>;
    return true;
}
template<> bool dispatch_init<799>(kernel_pointer* pTable)
{
    pTable[799] = &kernel<799>;
    return true;
}
template<> bool dispatch_init(kernel_pointer* pTable) { return true; }

// Call the recursive function few times to instantiate all the kernels without reaching the recursive instantiation limit
const bool initialized = dispatch_init<0  >( kernel_list );
const bool initialized = dispatch_init<200>( kernel_list );
const bool initialized = dispatch_init<400>( kernel_list );
const bool initialized = dispatch_init<600>( kernel_list );
const bool initialized = dispatch_init<800>( kernel_list );


int main()
{
    int kernel_index = 10;

    kernel_pointer my_kernel = kernel_list[kernel_index];
    my_kernel<<<<1,1>>>>(6);        
}

I don't like this fix but it will do the trick for the moment. If a Nvidia developer comes here some day, it should be a good idea to add the option "ftemplate-depth" to nvcc, no?

Template excessive recursion at instantiation cuda

Answers (2)

Related Questions