user2188453
user2188453

Reputation: 1145

How to pass device function as an input argument to host-side function?

I just want to pass device function as argument of a host function, of cause, the host function then can launch some kernels with this device side function.

I tried the usual C++ way (pass by pointer/reference) and the CUDA debugger told me the kernel cannot launch.

Update:

What I want to do is:

__host__ void hostfunction(int a, int (*DeviceFunction)(int))
{
   /...do something.../
   somekernel<<<blocks, threads>>>(int * in, DeviceFunction);
}

And launch the host with:

hostfunction(x, &SomeDeviceFunctionTemplate<int>);

Upvotes: 0

Views: 472

Answers (2)

Robert Crovella
Robert Crovella

Reputation: 151879

This example might be of interest:

$ cat t237.cu
#include <stdio.h>


__device__ int f1(){ printf("dev f1\n"); return 0;}
__device__ int f2(){ printf("dev f2\n"); return 0;}
__device__ int f3(){ printf("dev f3\n"); return 0;}

__device__ int *fptrf1 = (int *)f1;
__device__ int *fptrf2 = (int *)f2;
__device__ int *fptrf3 = (int *)f3;


__global__ void mykernel(int (*fptr)()){

  fptr();
  printf("executed\n");
}

int main(){

  int *hf1, *hf2, *hf3;
  cudaMemcpyFromSymbol(&hf1, fptrf1, sizeof(int *));
  cudaMemcpyFromSymbol(&hf2, fptrf2, sizeof(int *));
  cudaMemcpyFromSymbol(&hf3, fptrf3, sizeof(int *));
  mykernel<<<1,1>>>((int (*)())hf1);
  cudaDeviceSynchronize();
  mykernel<<<1,1>>>((int (*)())hf2);
  cudaDeviceSynchronize();
  mykernel<<<1,1>>>((int (*)())hf3);
  cudaDeviceSynchronize();
  return 0;
}
$ nvcc -arch=sm_20 -O3 -o t237 t237.cu
$ ./t237
dev f1
executed
dev f2
executed
dev f3
executed
[bob@cluster1 misc]$

I think this is roughly along the lines of what Jared was suggesting. As he mentioned, this will not be possible in host code:

&SomeDeviceFunctionTemplate<int>

Assuming SomeDeviceFunctionTemplate refers to a __device__ function.

Upvotes: 2

Tom
Tom

Reputation: 21108

It'd be helpful if you could post an example of what you are trying to do, but one thing to check is that you are compiling and running on Fermi (sm_20) or later since older GPUs did not support non-inlined function calls.

Check the compute capability of your device (needs 2.0 or later) and check your nvcc command line (needs -arch=sm_20 or later, or the -gencode equivalent).

Upvotes: 1

Related Questions