Ono
Ono

Reputation: 1357

saving constant in CUDA on GPU

If I want to do a for loop within which there is a CUDA code-based computation, and there are a bunch of constant that I do not want to transfer back and forth between CPU and GPU each time the code is executed, is there anything I can do?

For example:

float* a, *b, *c;  // a, b, and c changes each time for loop is executed
int M, N;          // M and N get their value prior to the for loop, and 
                   // they do not change during the for loop

for (int n = 0; n < 100; n++)
{
    CUDAComputation(a,b,c,M,N);
} 

__global__ void CUDAComputation(double *a,
                                double *b,
                                double *c,
                                int M,
                                int N)
{
    // cuda-based code
}

I think I can declare global variables in .cu code, which includes my header file, but then the M and N are in global memory, whose access to CUDA should be slow? Or I have to cudamemcpy() M and N each time to the kernel? Thanks.

Upvotes: 0

Views: 115

Answers (1)

Christian Sarofeen
Christian Sarofeen

Reputation: 2250

M and N are ints getting sent through in the kernel parameters. I would not be concerned with the speed of this transaction considering there is some overhead with calling a kernel, and the additional overhead of sending 2 ints will not be significant. However, you can do the following:

__device__ int d_M, d_N;
int h_M, h_N;

__global__ void CUDAComputation(){ 
 //d_M and d_N are accessible in here
}

void runKernel(){
    h_M=25; h_N=24;
    cudaMemcpyToSymbol(d_N, &h_M, sizeof(int));
    cudaMemcpyToSymbol(d_M, &h_M, sizeof(int));
    myKernel<<<128, 128>>>();
}

If you need something larger that is constant you can use something like:

__device__ float* devPointer; float* ptr;
cudaMalloc(&ptr, 256 * sizeof(float)); 
cudaMemcpyToSymbol(devPointer, &ptr, sizeof(ptr));

Upvotes: 2

Related Questions