Reputation: 1145
I just want to pass device function as argument of a host function, of cause, the host function then can launch some kernels with this device side function.
I tried the usual C++ way (pass by pointer/reference) and the CUDA debugger told me the kernel cannot launch.
Update:
What I want to do is:
__host__ void hostfunction(int a, int (*DeviceFunction)(int))
{
/...do something.../
somekernel<<<blocks, threads>>>(int * in, DeviceFunction);
}
And launch the host with:
hostfunction(x, &SomeDeviceFunctionTemplate<int>);
Upvotes: 0
Views: 472
Reputation: 151879
This example might be of interest:
$ cat t237.cu
#include <stdio.h>
__device__ int f1(){ printf("dev f1\n"); return 0;}
__device__ int f2(){ printf("dev f2\n"); return 0;}
__device__ int f3(){ printf("dev f3\n"); return 0;}
__device__ int *fptrf1 = (int *)f1;
__device__ int *fptrf2 = (int *)f2;
__device__ int *fptrf3 = (int *)f3;
__global__ void mykernel(int (*fptr)()){
fptr();
printf("executed\n");
}
int main(){
int *hf1, *hf2, *hf3;
cudaMemcpyFromSymbol(&hf1, fptrf1, sizeof(int *));
cudaMemcpyFromSymbol(&hf2, fptrf2, sizeof(int *));
cudaMemcpyFromSymbol(&hf3, fptrf3, sizeof(int *));
mykernel<<<1,1>>>((int (*)())hf1);
cudaDeviceSynchronize();
mykernel<<<1,1>>>((int (*)())hf2);
cudaDeviceSynchronize();
mykernel<<<1,1>>>((int (*)())hf3);
cudaDeviceSynchronize();
return 0;
}
$ nvcc -arch=sm_20 -O3 -o t237 t237.cu
$ ./t237
dev f1
executed
dev f2
executed
dev f3
executed
[bob@cluster1 misc]$
I think this is roughly along the lines of what Jared was suggesting. As he mentioned, this will not be possible in host code:
&SomeDeviceFunctionTemplate<int>
Assuming SomeDeviceFunctionTemplate
refers to a __device__
function.
Upvotes: 2
Reputation: 21108
It'd be helpful if you could post an example of what you are trying to do, but one thing to check is that you are compiling and running on Fermi (sm_20) or later since older GPUs did not support non-inlined function calls.
Check the compute capability of your device (needs 2.0 or later) and check your nvcc command line (needs -arch=sm_20
or later, or the -gencode
equivalent).
Upvotes: 1