Using function-templated code across the g++-nvcc boundary (including kernels)

Question

Suppose I compile the following with NVIDIA CUDA's nvcc compiler:

template
__global__ void fooKernel(T t1, T t2)  {
    Operator op;
    doSomethingWith(t1, t2);
}

template
__device__ __host__ void T bar(T t1, T t2)  {
    return t1 + t2;
}

template
void foo(T t1, T t2)  {
    fooKernel<<<2, 2>>>(t1, t2);
}

// explicit instantiation
template decltype(foo>) foo);

Now, I want my gcc, non-nvcc code to call foo():

...

template void foo(T t1, T t2);


foo> (123, 456);
...

I have the appropriate (?) instantiation in the .o/.a/.so file I compile with CUDA.

Can I make that happen?

jepio · Accepted Answer

The problem here is that templated code is typically instantiated at the place of usage, which doesn't work because foo contains a kernel call which cannot be parsed by g++. Your approach of explicitly instantiating the template and forward declaring it for the host compiler is the right one. Here's how to do this. I slightly fixed up your code and split it into 3 files:

gpu.cu
gpu.cuh
cpu.cpp

gpu.cuh

This file contains the templated code for use by gpu.cu. I added some purpose to your foo() function to make sure it works.

#pragma once
#include 

template 
struct bar {
    __device__ __host__ T operator()(T t1, T t2)
    {
        return t1 + t2;
    }
};

template

Using function-templated code across the g++-nvcc boundary (including kernels)

Answers (1)

gpu.cuh

gpu.cu

cpu.cpp

Makefile

Related Questions