How to define a utilization function which can be called both by a CUDA kernel and a regular C++ function

Question

I'm working on a project which involves a lot of mathematics. For a single target problem( for example, gradient calculation), we always have two versions of implementations: one CPU version and one CUDA version.

Now the CPU version is written in regular C++ and the kernel version is written in CUDA. If I want to define a small function, for example, vec_weight which returns the weight of a vector, I have to write one for the CPU compiled by g++ for the CPU version and one cuda version which has an "__device__" before it compiled by nvcc.

I'm not trying to define a "__device__ __host__" function here. What I want is kind of a library which can be called by a regular C++ function and a CUDA kernel. I tried to use the "__CUDACC__" macro but it didn't work.

Because we will have a lot of small utilization functions needed by both CPU version and GPU version, I think it is reasonable to combine them in to one.

Writing the CPU version in .cu instead of .cpp may solve our problem but this is not what we want.

So what should I do?

Here is the code segment:

head.h:

  1 #ifndef HEAD_H
  2 #define HEAD_H
  3 #ifdef __cplusplus
  4 extern "C"{
  5 #endif
  6 __device__ __host__ void myprint();
  7 #ifdef __cplusplus
  8 }
  9 #endif
  10 #endif

head.cu:

  1 #include "head.h"
  2 #include 
  3 void myprint(){
  4 // do something here
  5 }

main.cpp

 1 #include "head.h"
 2 int main(){
 3 myprint();
 4 }

I compiled the head.cu by:

nvcc -c head.cu

Link them together by :

g++ main.cpp head.o -o main   ( The reason that I didn't use nvcc here is that we are using the PGI's pgcpp in our project and we need it to talk to the PGI's OMP library. But I'm sure that there is something wrong here but I don't know how to fix that. )

The error messages:

In file included from main.cpp:18:
head.h:6: error: ‘__device__’ does not name a type
main.cpp: In function ‘int main()’:
main.cpp:20: error: ‘myprint’ was not declared in this scope

So I'm pretty sure that g++ couldn't recognize the "__device__" prefix here. But our project demands us to use PGCPP to compile the cpp file because this is the only way we can have omp directives works fine both in Fortran and C( Our project mixes C/C++, Fortran and CUDA). But here even the g++ can not work, so I think we have fix this first.

Eugene · Accepted Answer

Libraries usually contain the code compiled for the target processor (CPU or GPU) - hence you need to compile it through NVCC. Hence, you may as well put it in .cu files.

If you can release sources then you can put the code in headers and include them into .cpp or .cu files.

Update

This is what I did in my code (hdf function can be called from the host and device):

File devhost.h

#ifndef FUNCTIONS_H_
#define FUNCTIONS_H_

int myhost_functin(int);

#endif

File cu.cu:

__host__ __device__
int hdf(int a) {
    return a + 4;
}

int myhost_function(int a) {
    return hdf(a);
}

__global__
void kern(int *data) {
    data[threadIdx.x] = hdf(data[threadIdx.x]);
}

File cpp.cpp:

#include 
#include 

#include "devhost.h"

int main() {
    printf ("%d
", myhost_function(5));
    return 0;
}

This is how I compile and link it:

nvcc -c cu.cu
gcc -c cpp.cpp
gcc cpp.o cu.o -lcudart -L/usr/local/cuda-5.5/lib64

Note that you need to link with CUDART as CU file has device call.

Update 2

Slightly less elegant approach that still seems to compile is having something like following in your header file:

#ifdef __CUDACC__
__host__ __device__
#endif
static int anotherfunction(int a) {
    return a * 50;
}

In this case you will have a copy of your code in every translation unit, this will increase your compilation time and might increase the executable size.

How to define a utilization function which can be called both by a CUDA kernel and a regular C++ function

Answers (1)

Related Questions