Reputation: 3744

pow is not working properly inside a device function in CUDA

I am trying to use function pow inside a __device__ function in CUDA using Visual Studio 2019.

__device__ double Len(double a, double b)
{
    return pow(a, 2) + pow(b, 2);
}

However, it keeps giving me this error when I try to build the solution.

Error Undefined reference to '_Z3powdi' in 'x64/Debug/kernel.cu.obj'

It only works when I change 2 to 2.0. I thought this could be the correct format of the function to use non-integer values as its parameters, but when I tried it inside a normal C++ code, it worked properly with integer 2.

What is the reason of this problem? and how can I solve it?

Notes:

It was working normally few days ago, this error happened probably after the latest update of Visual Studio 2019, version 16.8.0.
I tried adding #include <math.h> and removing it, but it gave the same error.

Upvotes: 3

Answers (3)

Tan Su

Reputation: 39

Update: I changed all pow(float) to powf(float) and the problem was solved. If you can modify the code, I recommend modify them rather than stick to the old version.

My cuda project also got some wierd build issue after VS 16.8 update. After roll back to 16.7.8 the issue is fixed.

link to old version of video studio installer

Upvotes: 1

Shahin Dohan

Reputation: 6892

Note: I'm not a C++ developer, so forgive me if I'm speaking nonsense, or if my solution is bad.

We just had the same problem after updating our MSVC C++ toolset to v142 (19.28) in our C++/CLI project. The problem appeared only in runtime when calling Optix functions.

Our kernel.cu was calling non-existent functions from the CUDA Math API, even though intellisense was suggesting that it's calling the std lib functions. I don't know what was happening or why, but it works now.

In CUDA Math API, there are 2 pow functions:

double pow ( double x, double y )
float powf ( float x, float y )

What I did was to simply cast my integers to double (static_cast<double>(3)), and then in another call with float arguments, change call from pow to powf.

Upvotes: 1

njuffa

Reputation: 26085

CUDA has supported pow (double, int) in device code since proper double-precision support was added around 2008. This was a required function since at least the C++98 standard (ISO/IEC 14882 section 26.5). Here is a complete example program incorporating OP's function, with error checking omitted for brevity:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

__device__ double Len(double a, double b)
{
    return pow(a, 2) + pow(b, 2);
}

__global__ void kernel (double a, double b)
{
    printf ("len = %23.16e\n", Len(a, b));
}

int main (void)
{
    kernel<<<1,1>>>(3,4);
    cudaDeviceSynchronize();
    return EXIT_SUCCESS;
}

This compiles without errors on

CUDA 9.2 with MSVS 2010 (Microsoft (R) C/C++ Optimizing Compiler Version 16.00.40219.01 for x64) on Windows 7
CUDA 11.1 with MSVS 2019 (Microsoft (R) C/C++ Optimizing Compiler Version 19.27.29112 for x64) on Windows 10

I compiled for release and debug builds as follows (arguments in curly braces for debug build):

nvcc -o pow_dbl_int.exe {-g -G} pow_dbl_int.cu

The executable when run produces this output:

len =  2.5000000000000000e+01

If this example program does not compile correctly with the command line as shown, I would suspect that there is something messed up with the MSVS installation or the CUDA installation. In my practice, I find it generally beneficial to first install MSVS and then CUDA, so CUDA can integrate properly into MSVS when installed.

Since OP apparently installed a version of MSVS that shipped only a few days ago on November 10, 2020, there is also a possibility of an incompatibility between a host compiler header file and a CUDA header file, which is the reason why CUDA has historically imposed tight checks for supported host compiler versions (not sure whether it does so now). I note that Microsoft has since released MSVS 2019 16.8.1, with a release date of November 12, 2020.

As noted in multiple comments and also in the CUDA Best Practices Guide, squaring is more easily accomplished with just a multiply, and there is no need to invoke pow().

Upvotes: 5

pow is not working properly inside a __device__ function in CUDA

Answers (3)

Related Questions

pow is not working properly inside a device function in CUDA