CUDA Global Array declaration and initialization before kernel call example

Question

I need some help with CUDA global memory. In my project I must declare global array for avoid to send this array at every kernel call.

Edit:

My application can call the kernel more than 1,000 times, and on every call I'm sending him an array with size more than [1000 * 1000], So I think it's taking more time , that's why my app works slowly. So I need declare global array for the GPU and my questions are

How to declare Global array?
How to initialize a global array from CPU before the kernel launch?

Thanks in advance!

Robert Crovella · Accepted Answer

Your edited question is confusing because you say you are sending your kernel an array of size 1000 x 1000 but you want to know how to do this using a global array. The only way I know of to send this much data to a kernel is to use a global array, so you are probably already doing this with an array in global memory.

Nevertheless, there are 2 methods, at least, to create and initialize an array in global memory:

1.statically, using __device__ and cudaMemcpyToSymbol, for example:

 #define SIZE 100
 __device__ int A[SIZE];
 ...
 int main(){
   int myA[SIZE];
   for (int i=0; i< SIZE; i++) myA[i] = 5;
   cudaMemcpyToSymbol(A, myA, SIZE*sizeof(int));
   ...
   (kernel calls, etc.)
 }

(device variable reference, cudaMemcpyToSymbol reference)

2.dynamically, using cudaMalloc and cudaMemcpy:

 #define SIZE 100
 ...
 int main(){
   int myA[SIZE];
   int *A;
   for (int i=0; i< SIZE; i++) myA[i] = 5;
   cudaMalloc((void **)&A, SIZE*sizeof(int));
   cudaMemcpy(A, myA, SIZE*sizeof(int), cudaMemcpyHostToDevice);
   ...
   (kernel calls, etc.)
 }

(cudaMalloc reference, cudaMemcpy reference)

For clarity I'm omitting error checking which you should do on all cuda calls and kernel calls.

CUDA Global Array declaration and initialization before kernel call example

Edit:

Answers (2)

Related Questions