Reputation: 131546
If we have a __host__ __device__
function in CUDA, we can use macros to choose different code paths for host-side and device-side code in its implementations, like so:
__host__ __device__ int foo(int x)
{
#ifdef CUDA_ARCH
return x * 2;
#else
return x;
#endif
}
but why is it that we can't write:
__host__ __device__ int foo(int x);
__device__ int foo(int x) { return x * 2; }
__host__ int foo(int x) { return x; }
instead?
Upvotes: 2
Views: 1183
Reputation: 15941
The Clang implementation of CUDA C++ actually supports overloading on __host__
and
__device__
because it considers the execution space qualifiers part of the function signature. Note, however, that even there, you'd have to declare the two functions separately:
__device__ int foo(int x);
__host__ int foo(int x);
__device__ int foo(int x) { return x * 2; }
__host__ int foo(int x) { return x; }
Personally, I'm not sure how desirable/important that really is to have though. Consider that you can just define a foo(int x)
in the host code outside of your CUDA source. If someone told me they need to have different implementations of the same function for host and device where the host version for some reason needs to be defined as part of the CUDA source, my initial gut feeling would be that there's likely something going in a bit of an odd direction. If the host version does something different, shouldn't it most likely have a different name? If it logically does the same thing just not using the GPU, then why does it have to be part of the CUDA source? I'd generally advocate for keeping as clean and strict a separation between host and device code as possible and keeping any host code inside the CUDA source to the bare minimum. Even if you don't care about the cleanliness of your code, doing so will at least minimize the chances of getting hurt by all the compiler magic that goes on under the hood…
Upvotes: 3