How do I get the current compute capability of a GPU from the host portion of the code?

Question

I tried to use __CUDA_ARCH__ but I read somewhere that this works only on the device portion of code. After that, I came across this code on github: link

Is there any better way to achieve this?

I am asking this because I would like to determine (on host code) if the GPU supports unified memory in which case a cudaMallocManaged would take place or cudaMallocs && cudaMemcpys would take place instead.

Example of what I would like to do:

int main() {
  // IF CUDA >= 6.0 && COMPUTE CAPABILITY >= 3.0
      // USE cudaMallocManaged
  // ELSE
      // USE cudaMallocs && cudaMemcpys
  // END IF
  return 0;
}

Robert Crovella · Accepted Answer

There seem to be two questions involved here:

How can I query (at compile time) the CUDA runtime API version that a particular code is being compiled for, so that I can determine whether it is safe to use certain runtime API elements (such as those associated with managed memory) which may only have appeared in newer runtime API versions?

One method is already discussed here. As a condensed version for this particular case, you could do something like:
```
#include 
...
// test for CUDA version of 6.0 or higher
#if CUDART_VERSION >= 6000 
// safe to use e.g. cudaMallocManaged() here
#else
// e.g. do not use managed memory API here
#endif
```
How can I determine if I can use managed memory at run-time?

As already mentioned in the comments, if you have established that the CUDA version being compiled against is CUDA 6.0 or higher (e.g. see above) then you should test for support for managed memory before attempting to use cudaMallocManaged for example. The deviceQuery CUDA sample code indicates a general methodology (for example using cudaGetDeviceProperties, testing the managedMemSupported property) for testing capabilities, at run-time.

How do I get the current compute capability of a GPU from the host portion of the code?

Answers (1)

Related Questions