Allocating global variables on multiple GPUs

Question

I have a code working on a single GPU. In that code, I used

__device__ uint32_t aaa;

This line at the begining of code declared a global variable on the only involved device.

Now I want to use multiple devices (two or more), but I don't know how to allocate global variables in this case.

I think I should use cudaSetDevice() but I wonder where I should call this function.

Robert Crovella · Accepted Answer

When you create a variable like this:

__device__ int myval;

It is created at global scope. An allocation for it is made in the GPU memory of each device that is present when your application is launched.

In host code (when using such functions as cudaMemcpyFromSymbol()), you will be accessing whichever one corresponds to your most recent cudaSetDevice() call. In device code, you will be accessing whichever one corresponds to the device that your device code is executing on

The __device__ declaration is at global scope (and statically allocated) in your program. Variables at global scope are set up without the help of any runtime activity. Therefore there is no opportunity to specify which devices the variable should be instantiated on, so CUDA instantiates those variables on all devices present. Dynamically allocated device variables however are allocated using the runtime calls cudaMalloc and cudaMemcpy and so we can precede these calls with a cudaSetDevice call in a multi-GPU system, and so the CUDA runtime manages these variables on a per-device basis, which is consistent with the behavior of most CUDA runtime API calls, which operate on the most recently selected device via cudaSetDevice.

Allocating global variables on multiple GPUs

Answers (1)

Related Questions