Reputation: 3050
How should I acces 'do_sth' function in 'print' function (look at the code)? Why there is 'N' (look at the code) variable/constant visible to GPU without using cudaMemcpy?
__device__ void do_sth(char *a, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if(idx < N)
{
a[idx] = a[idx];
}
}
__global__ void print(char *a, int N)
{
//question_1: why there is an access to N, it is now in GPU memory, how?
int idx = blockIdx.x * blockDim.x + threadIdx.x;
//do_sth<<<nblock2,blocksize2>>>(a,N); //error_1: a host function call can not be configured
//do_sth(&&a,N); //error_2: expected an expression
if(idx<N)
{
a[idx]=a[idx];
}
}
Upvotes: 1
Views: 14765
Reputation: 21818
__global__
function (aka "kernel") resides on the GPU already. All its parameters (variables a
and N
) are passed through shared or constant memory (depending on your device type) upon the call, so you can directly access those variables. There is a limit of parameters size - 256B on pre-Fermi cards and 16KB(?) 4KB on Fermi, so if you have big chunks of data to transfer, you cannot avoid cudaMemcpy
functions.
__global__
function parameters should not be modified.
When calling __device__
from __global__
you do not specify the configuration parameters in the triple brackets. The __device__
function will be called by all threads that reach the call from the kernel. Note that you can call functions from within if
statements, to prevent some threads from executing it.
In current version of CUDA it is impossible to spawn more threads during kernel execution.
There is no unary &&
operator in CUDA C++ (there was no such operator in normal C++, not sure about it now when the new standard emerges)
Upvotes: 6