Damian
Damian

Reputation: 3050

Calling a device function from global function

How should I acces 'do_sth' function in 'print' function (look at the code)? Why there is 'N' (look at the code) variable/constant visible to GPU without using cudaMemcpy?

 __device__ void do_sth(char *a, int N)
 {
         int idx = blockIdx.x * blockDim.x + threadIdx.x;
         if(idx < N)
         {       
                 a[idx] = a[idx]; 
         }
 }


 __global__ void print(char *a, int N) 
 {     
         //question_1: why there is an access to N, it is now in GPU memory, how?
         int idx = blockIdx.x * blockDim.x + threadIdx.x;

         //do_sth<<<nblock2,blocksize2>>>(a,N); //error_1: a host function call can not be configured
         //do_sth(&&a,N); //error_2: expected an expression

         if(idx<N)
         {       
                 a[idx]=a[idx];
         }
 }

Upvotes: 1

Views: 14765

Answers (1)

CygnusX1
CygnusX1

Reputation: 21818

  • __global__ function (aka "kernel") resides on the GPU already. All its parameters (variables a and N) are passed through shared or constant memory (depending on your device type) upon the call, so you can directly access those variables. There is a limit of parameters size - 256B on pre-Fermi cards and 16KB(?) 4KB on Fermi, so if you have big chunks of data to transfer, you cannot avoid cudaMemcpy functions.

  • __global__ function parameters should not be modified.

  • When calling __device__ from __global__ you do not specify the configuration parameters in the triple brackets. The __device__ function will be called by all threads that reach the call from the kernel. Note that you can call functions from within if statements, to prevent some threads from executing it.

  • In current version of CUDA it is impossible to spawn more threads during kernel execution.

  • There is no unary && operator in CUDA C++ (there was no such operator in normal C++, not sure about it now when the new standard emerges)

Upvotes: 6

Related Questions