How to share global memory within multiple kernels and multiple GPUs?

Question

----------------a.c---------------------
variable *XX;
func1(){
  for(...){
    for(i = 0; i < 4; i++)
       cutStartThread(func2,args)
  } 
}
---------------b.cu-------------------
func2(args){
  cudaSetDevice(i);
  xx = cudaMalloc();
  mykernel<<<...>>>(xx);
}
--------------------------------------

Recently, I want to use multiple GPU device for my program. There are four Tesla C2075 cards on my node. I use four threads to manage the four GPUs. What's more, the kernel in each thread is launched several times. A simple pseudo code as above. I have two questions:

Variable XX is a very long string, and is read only in the kernel. I want to preserve it during the multiple launches of mykernel. Is it ok to call cudaMalloc and pass the pointer to mykernel only when mykernel is first launched? Or should I use __device__ qualifier?
XX is used in four threads, so I declare it as a global variable in file a.c. Are multiple cudaMalloc of XX correct or should I use an array such as variable *xx[4]?

How to share global memory within multiple kernels and multiple GPUs?

Answers (1)

Related Questions