Jacky Lau
Jacky Lau

Reputation: 31

How to share global memory within multiple kernels and multiple GPUs?

----------------a.c---------------------
variable *XX;
func1(){
  for(...){
    for(i = 0; i < 4; i++)
       cutStartThread(func2,args)
  } 
}
---------------b.cu-------------------
func2(args){
  cudaSetDevice(i);
  xx = cudaMalloc();
  mykernel<<<...>>>(xx);
}
--------------------------------------

Recently, I want to use multiple GPU device for my program. There are four Tesla C2075 cards on my node. I use four threads to manage the four GPUs. What's more, the kernel in each thread is launched several times. A simple pseudo code as above. I have two questions:

  1. Variable XX is a very long string, and is read only in the kernel. I want to preserve it during the multiple launches of mykernel. Is it ok to call cudaMalloc and pass the pointer to mykernel only when mykernel is first launched? Or should I use __device__ qualifier?

  2. XX is used in four threads, so I declare it as a global variable in file a.c. Are multiple cudaMalloc of XX correct or should I use an array such as variable *xx[4]?

Upvotes: 0

Views: 2416

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 152164

  1. For usage by kernels running on a single device, you can call cudaMalloc once to create your variable XX holding the string, then pass the pointer created by cudaMalloc (i.e. XX) to whichever kernels need it.

    #define xx_length 20
    char *XX;
    cudaMalloc((void **)&XX, xx_length * sizeof(char));
    ...
    kernel1<<<...>>>(XX, ...);
    ...
    kernel2<<<...>>>(XX, ...);
    etc.
    
  2. Create a separate XX variable for each thread, assuming that each thread is being used to access a different device. How exactly you do this will depend on the scope of XX. But an array of:

    char *XX[num_devices]; 
    

at global scope, should be OK.

The CUDA OpenMP sample may be of interest as an example of how to use multiple threads to manage multiple GPUs.

Upvotes: 1

Related Questions