CUDA (nested?) memory allocation

Question

I am optimizing some code using CUDA. I am not sure if I should use cudaMalloc inside _ _ global _ _ function (fun1) or not (isn't x already allocated on GPU's memory?):

__global__ void fun2(double *y)
{
    int i=blockIdx.x;
    y[i]=...;
}

__global__ void fun1(double *x)
{
    //should I cudaMalloc() y for fun2 or just use the x which was already allocated in main?
    fun2<<(x);
    ...
}

int main(){
    double *x;
    ...
    cudaMalloc((void**)&x, N*sizeof(double));
    fun1<<>>(x);
    ...
}

T_T · Accepted Answer

May be you mean something like this:

 __device__ void fun2(double *y)
    {
      int i=blockIdx.x;
      y[i]=...;
}

__global__ void fun1(double *x)
{

    fun2(x);
    ...
}

int main(){
    double *x;
    ...
    cudaMalloc((void**)&x, N*sizeof(double));
    fun1<<>>(x);
    ...
}

But it's common to calculate threadId in global function

CUDA (nested?) memory allocation

Answers (1)

Related Questions