Calling a device function from global function

Question

How should I acces 'do_sth' function in 'print' function (look at the code)? Why there is 'N' (look at the code) variable/constant visible to GPU without using cudaMemcpy?

 __device__ void do_sth(char *a, int N)
 {
         int idx = blockIdx.x * blockDim.x + threadIdx.x;
         if(idx < N)
         {       
                 a[idx] = a[idx]; 
         }
 }


 __global__ void print(char *a, int N) 
 {     
         //question_1: why there is an access to N, it is now in GPU memory, how?
         int idx = blockIdx.x * blockDim.x + threadIdx.x;

         //do_sth<<>>(a,N); //error_1: a host function call can not be configured
         //do_sth(&&a,N); //error_2: expected an expression

         if(idx

CygnusX1 · Accepted Answer

__global__ function (aka "kernel") resides on the GPU already. All its parameters (variables a and N) are passed through shared or constant memory (depending on your device type) upon the call, so you can directly access those variables. There is a limit of parameters size - 256B on pre-Fermi cards and ~~16KB(?)~~ 4KB on Fermi, so if you have big chunks of data to transfer, you cannot avoid cudaMemcpy functions.
__global__ function parameters should not be modified.
When calling __device__ from __global__ you do not specify the configuration parameters in the triple brackets. The __device__ function will be called by all threads that reach the call from the kernel. Note that you can call functions from within if statements, to prevent some threads from executing it.
~~In current version of CUDA it is impossible to spawn more threads during kernel execution.~~
There is no unary && operator in CUDA C++ (there was no such operator in normal C++, not sure about it now when the new standard emerges)

Calling a device function from global function

Answers (1)

Related Questions