Reputation: 31
----------------a.c---------------------
variable *XX;
func1(){
for(...){
for(i = 0; i < 4; i++)
cutStartThread(func2,args)
}
}
---------------b.cu-------------------
func2(args){
cudaSetDevice(i);
xx = cudaMalloc();
mykernel<<<...>>>(xx);
}
--------------------------------------
Recently, I want to use multiple GPU device for my program. There are four Tesla C2075 cards on my node. I use four threads to manage the four GPUs. What's more, the kernel in each thread is launched several times. A simple pseudo code as above. I have two questions:
Variable XX
is a very long string, and is read only in the kernel. I want to preserve it during the multiple launches of mykernel
. Is it ok to call cudaMalloc
and pass the pointer to mykernel
only when mykernel
is first launched? Or should I use __device__
qualifier?
XX
is used in four threads, so I declare it as a global variable in file a.c
. Are multiple cudaMalloc
of XX
correct or should I use an array such as variable *xx[4]
?
Upvotes: 0
Views: 2416
Reputation: 152164
For usage by kernels running on a single device, you can call cudaMalloc
once to create your variable XX
holding the string, then pass the pointer created by cudaMalloc (i.e. XX
) to whichever kernels need it.
#define xx_length 20
char *XX;
cudaMalloc((void **)&XX, xx_length * sizeof(char));
...
kernel1<<<...>>>(XX, ...);
...
kernel2<<<...>>>(XX, ...);
etc.
Create a separate XX variable for each thread, assuming that each thread is being used to access a different device. How exactly you do this will depend on the scope of XX. But an array of:
char *XX[num_devices];
at global scope, should be OK.
The CUDA OpenMP sample may be of interest as an example of how to use multiple threads to manage multiple GPUs.
Upvotes: 1