C++ design for CUDA codes

Question

I have a piece of C++ CUDA code which I have to write declaring the data variable in float. I also have to rewrite the code declaring the data variable in double.

What is a good design to handle a situation like this in CUDA?

I do not want to have two sets of same code because then in the future for any change I will have to have to change two sets of otherwise identical code. I also want to keep the code clean without too many #ifdef to change between float and double within the code.

Can anyone please suggest any good (in terms of maintenance and "easy to read") design?

talonmies · Accepted Answer

CUDA supports type templating, and it is without doubt the most efficient way to implement kernel code where you need to handle multiple types in the same code.

As a trivial example, consider a simple BLAS AXPY type kernel:

template
__global__ void axpy(const Real *x, Real *y, const int n, const Real a)
{
    int tid = threadIdx.x + blockIdx.x * blockDim.x;
    int stride = blockDim.x * gridDim.x;

    for(; tid



This templated kernel can be instantiated for both double and single precision without loss of generality:

template axpy(const float *, float *, const int, const float);
template axpy(const double *, double *, const int, const double);


The thrust template library, which ships with all recent versions of the CUDA toolkit, makes extensive use of this facility for implementing type agnostic algorithms.

C++ design for CUDA codes

Answers (2)

Related Questions