Reputation: 6140
I am optimizing some code using CUDA. I am not sure if I should use cudaMalloc inside _ _ global _ _ function (fun1) or not (isn't x already allocated on GPU's memory?):
__global__ void fun2(double *y)
{
int i=blockIdx.x;
y[i]=...;
}
__global__ void fun1(double *x)
{
//should I cudaMalloc() y for fun2 or just use the x which was already allocated in main?
fun2<<<N,1>(x);
...
}
int main(){
double *x;
...
cudaMalloc((void**)&x, N*sizeof(double));
fun1<<<N,1>>>(x);
...
}
Upvotes: 0
Views: 142
Reputation: 583
May be you mean something like this:
__device__ void fun2(double *y)
{
int i=blockIdx.x;
y[i]=...;
}
__global__ void fun1(double *x)
{
fun2(x);
...
}
int main(){
double *x;
...
cudaMalloc((void**)&x, N*sizeof(double));
fun1<<<N,1>>>(x);
...
}
But it's common to calculate threadId in global function
Upvotes: 1