Reputation: 75
I have a problem with CUDA programing ! Input is a matrix A( 2 x 2 ) Ouput is a matrix A( 2 x 2 ) with every new value is **3 exponent of the old value ** example : input : A : { 2,2 } output : A { 8,8 } { 2,2 } { 8,8 }
I have 2 function in file CudaCode.CU :
__global__ void Power_of_02(int &a)
{
a=a*a;
}
//***************
__global__ void Power_of_03(int &a)
{
int tempt = a;
Power_of_02(a); //a=a^2;
a= a*tempt; // a = a^3
}
and Kernel :
__global__ void CudaProcessingKernel(int *dataA ) //kernel function
{
int bx = blockIdx.x;
int tx = threadIdx.x;
int tid = bx * XTHREADS + tx;
if(tid < 16)
{
Power_of_03(dataA[tid]);
}
__syncthreads();
}
I think it's right, but the error appear : calling a __global__ function("Power_of_02") from a __global__ function("Power_of_03") is only allowed on the compute_35 architecture or above
Why I wrong ? How to repair it ?
Upvotes: 3
Views: 5933
Reputation: 152249
The error is fairly explanatory. A CUDA function decorated with __global__
represents a kernel. Kernels can be launched from host code. On cc 3.5 or higher GPUs, you can also launch a kernel from device code. So if you call a __global__
function from device code (i.e. from another CUDA function that is decorated with __global__
or __device__
), then you must be compiling for the appropriate architecture. This is called CUDA dynamic parallelism, and you should read the documentation to learn how to use it, if you want to use it.
When you launch a kernel, whether from host or device code, you must provide a launch configuration, i.e. the information between the triple-chevron notation:
CudaProcessingKernel<<<grid, threads>>>(d_A);
If you want to use your power-of-2 code from another kernel, you will need to call it in a similar, appropriate fashion.
Based on the structure of your code, however, it seems like you can make things work by declaring your power-of-2 and power-of-3 functions as __device__
functions:
__device__ void Power_of_02(int &a)
{
a=a*a;
}
//***************
__device__ void Power_of_03(int &a)
{
int tempt = a;
Power_of_02(a); //a=a^2;
a= a*tempt; // a = a^3
}
This should probably work for you and perhaps was your intent. Functions decorated with __device__
are not kernels (so they are not callable directly from host code) but are callable directly from device code on any architecture. The programming guide will also help to explain the difference.
Upvotes: 6