Reputation: 183
In an unexpected manner,powf
produces a strange output for odd base numbers when its type is int
. For example powf(-4,2)
returns 16
but powf(-5,2)
returns24
!!!
After tracing the root of a wrong output in a long calculation, I figured out that powf
function shows strange behavior for odd numbers when output type is an integer
.
__global__ void intFoo( int* a)
{
*a = powf(*a, 2);
}
__global__ void doubleFoo( double* a)
{
*a = powf(*a, 2);
}
I can call this kernel (for instance) in Matlab :
!nvcc -ptx test.cu
k1 = parallel.gpu.CUDAKernel('test.ptx', 'test.cu', 'intFoo');
k2 = parallel.gpu.CUDAKernel('test.ptx', 'test.cu', 'doubleFoo');
out1 = feval(k1, -4)
out2 = feval(k1, -5)
out3 = feval(k2, -4)
out4 = feval(k2, -5)
result:
out1 = 16
out2 = 24 //This hasn't to be 25 !!??
out3 = 16
out4 = 25.000
EDIT:
After investigating in Matlab by @Robert Crovella suggestion, I found out that Command Window in Matlab shows out4=25.000
as opposed to Variables Window which shows the content of out4 = 24.9999981
.
Everyone should be very cautious as there is a small error associated with output of powf
function (24.9999981
instead of 25
) that may get propagated and become a problem with large calculations
Upvotes: 0
Views: 838
Reputation: 151859
I believe this is due to unwise usage of datatypes with feval
.
It appears to me that feval
converts the return type to be the same type as the parameter type. This makes sense, since the return type is extracted from the pointer to the passed argument for that parameter.
note that powf
takes float
parameters and returns a float
, and pow
takes double
parameters and returns a double
. int
quantities don't have a separate function (prototype) in the CUDA math API, so if you use them they will be cast to and from floating point types.
Here's what I see in pure CUDA C++:
$ cat t32.cu
#include <math.h>
#include <stdio.h>
__global__ void Foo( int a, double b)
{
float res = powf((float)a, 2);
printf("powf_int: %d, %d, %f\n", a, (int)res, res);
res = powf((float)b, 2);
printf("powf_double: %f, %f, %f\n", b, (double)res, res);
double dres = pow((double)a, 2);
printf("pow_int: %d, %d, %f\n", a, (int)dres, dres);
dres = pow((double)b, 2);
printf("pow_double: %f, %f, %f\n", b, (double)dres, dres);
}
int main(){
Foo<<<1,1>>>(-5, -5);
cudaDeviceSynchronize();
}
$ nvcc -o t32 t32.cu
$ cuda-memcheck ./t32
========= CUDA-MEMCHECK
powf_int: -5, 24, 24.999998
powf_double: -5.000000, 24.999998, 24.999998
pow_int: -5, 25, 25.000000
pow_double: -5.000000, 25.000000, 25.000000
========= ERROR SUMMARY: 0 errors
$
Note that:
powf
returns 24.999998 for (-5,2)
int
it gets truncated to 24double
and then round to 3 decimal places, the correctly rounded result would be 25.000 just as displayed in your matlab outputSuggestions:
powf(x, 2)
and will possibly be more accurate also.If you want to know "why does CUDA powf(-5, 2)
return 24.999998?", please ask that in a separate question. The accuracy is defined in the programming manual and I'm reasonably sure this falls within the published error bounds. Here is another example of pow
"weirdness".
Upvotes: 4
Reputation: 26085
As an addendum to Robert Crovella's answer: CUDA is a subset of C++ and therefore provides overloaded math functions. In particular it offers the following four variants of pow()
:
float pow (float, int);
double pow (double, int);
float pow (float, float);
double pow (double, double);
If you check the machine code generated for these variants with cuobjdump --dump-sass
you will find that four different implementations are used. As Robert Crovella pointed out, for the particular case of squaring it is best to simply use multiplication, but you could certainly use pow()
if you wanted to, as demonstrated by the following code (error checks omitted for conciseness):
#include <cmath>
#include <cstdlib>
#include <cstdio>
__global__ void kernel (int ib, float fa, float fb, double da, double db)
{
printf ("pow_float_int = %15.8e\n", pow (fa, ib));
printf ("pow_float_float = %15.8e\n", pow (fa, fb));
printf ("pow_double_int = %23.16e\n", pow (da, ib));
printf ("pow_double_double = %23.16e\n", pow (da, db));
}
int main (void)
{
int ia = -5, ib = 2;
float fa = ia, fb = ib;
double da = ia, db = ib;
kernel<<<1,1>>>(ib, fa, fb, da, db);
cudaDeviceSynchronize();
return EXIT_SUCCESS;
}
The output of the above program should look something like this:
pow_float_int = 2.50000000e+01
pow_float_float = 2.49999981e+01
pow_double_int = 2.5000000000000000e+01
pow_double_double = 2.5000000000000000e+01
Upvotes: 2