Why can't we split __host__ and __device__ implementations?

Question

If we have a __host__ __device__ function in CUDA, we can use macros to choose different code paths for host-side and device-side code in its implementations, like so:

__host__ __device__ int foo(int x)
{
#ifdef CUDA_ARCH
    return x * 2;
#else
    return x;
#endif
}

but why is it that we can't write:

__host__ __device__ int foo(int x);

__device__ int foo(int x) { return x * 2; }
__host__   int foo(int x) { return x; }

instead?

Michael Kenzel · Accepted Answer

The Clang implementation of CUDA C++ actually supports overloading on __host__ and __device__ because it considers the execution space qualifiers part of the function signature. Note, however, that even there, you'd have to declare the two functions separately:

__device__ int foo(int x);
__host__ int foo(int x);

__device__ int foo(int x) { return x * 2; }
__host__   int foo(int x) { return x; }

test it out here

Personally, I'm not sure how desirable/important that really is to have though. Consider that you can just define a foo(int x) in the host code outside of your CUDA source. If someone told me they need to have different implementations of the same function for host and device where the host version for some reason needs to be defined as part of the CUDA source, my initial gut feeling would be that there's likely something going in a bit of an odd direction. If the host version does something different, shouldn't it most likely have a different name? If it logically does the same thing just not using the GPU, then why does it have to be part of the CUDA source? I'd generally advocate for keeping as clean and strict a separation between host and device code as possible and keeping any host code inside the CUDA source to the bare minimum. Even if you don't care about the cleanliness of your code, doing so will at least minimize the chances of getting hurt by all the compiler magic that goes on under the hood…

Why can't we split host and device implementations?

Answers (1)

Related Questions

Why can&#39;t we split __host__ and __device__ implementations?

Answers (1)

Related Questions

Why can't we split host and device implementations?