Mohsen
Mohsen

Reputation: 183

strange behavior of powf() function

In an unexpected manner,powf produces a strange output for odd base numbers when its type is int. For example powf(-4,2)returns 16 but powf(-5,2) returns24 !!!

After tracing the root of a wrong output in a long calculation, I figured out that powf function shows strange behavior for odd numbers when output type is an integer.

__global__ void intFoo( int* a) 
{
    *a = powf(*a, 2);
}
__global__ void doubleFoo( double* a) 
{
    *a = powf(*a, 2);
}

I can call this kernel (for instance) in Matlab :

!nvcc -ptx test.cu 
k1 = parallel.gpu.CUDAKernel('test.ptx', 'test.cu', 'intFoo');
k2 = parallel.gpu.CUDAKernel('test.ptx', 'test.cu', 'doubleFoo');
out1 = feval(k1, -4)
out2 = feval(k1, -5)
out3 = feval(k2, -4)
out4 = feval(k2, -5)

result:

out1 = 16
out2 = 24 //This hasn't to be 25 !!??
out3 = 16
out4 = 25.000

EDIT:

After investigating in Matlab by @Robert Crovella suggestion, I found out that Command Window in Matlab shows out4=25.000 as opposed to Variables Window which shows the content of out4 = 24.9999981.

Everyone should be very cautious as there is a small error associated with output of powf function (24.9999981 instead of 25) that may get propagated and become a problem with large calculations

Upvotes: 0

Views: 838

Answers (2)

Robert Crovella
Robert Crovella

Reputation: 151859

I believe this is due to unwise usage of datatypes with feval.

It appears to me that feval converts the return type to be the same type as the parameter type. This makes sense, since the return type is extracted from the pointer to the passed argument for that parameter.

note that powf takes float parameters and returns a float, and pow takes double parameters and returns a double. int quantities don't have a separate function (prototype) in the CUDA math API, so if you use them they will be cast to and from floating point types.

Here's what I see in pure CUDA C++:

$ cat t32.cu
#include <math.h>
#include <stdio.h>

__global__ void Foo( int a, double b)
{
            float res = powf((float)a, 2);
            printf("powf_int: %d, %d, %f\n", a, (int)res, res);
            res = powf((float)b, 2);
            printf("powf_double: %f, %f, %f\n", b, (double)res, res);
            double dres = pow((double)a, 2);
            printf("pow_int: %d, %d, %f\n", a, (int)dres, dres);
            dres = pow((double)b, 2);
            printf("pow_double: %f, %f, %f\n", b, (double)dres, dres);
}

int main(){

        Foo<<<1,1>>>(-5, -5);
        cudaDeviceSynchronize();
}
$ nvcc -o t32 t32.cu
$ cuda-memcheck ./t32
========= CUDA-MEMCHECK
powf_int: -5, 24, 24.999998
powf_double: -5.000000, 24.999998, 24.999998
pow_int: -5, 25, 25.000000
pow_double: -5.000000, 25.000000, 25.000000
========= ERROR SUMMARY: 0 errors
$

Note that:

  1. CUDA powf returns 24.999998 for (-5,2)
  2. if we convert this to int it gets truncated to 24
  3. if we convert this to double and then round to 3 decimal places, the correctly rounded result would be 25.000 just as displayed in your matlab output

Suggestions:

  1. don't do this
  2. don't use integer types with floating point functions (especially casting the result)
  3. if you want to square something, just multiply it with itself. It will definitely be faster than using powf(x, 2) and will possibly be more accurate also.

If you want to know "why does CUDA powf(-5, 2) return 24.999998?", please ask that in a separate question. The accuracy is defined in the programming manual and I'm reasonably sure this falls within the published error bounds. Here is another example of pow "weirdness".

Upvotes: 4

njuffa
njuffa

Reputation: 26085

As an addendum to Robert Crovella's answer: CUDA is a subset of C++ and therefore provides overloaded math functions. In particular it offers the following four variants of pow():

float pow (float, int); 
double pow (double, int); 
float pow (float, float); 
double pow (double, double);

If you check the machine code generated for these variants with cuobjdump --dump-sass you will find that four different implementations are used. As Robert Crovella pointed out, for the particular case of squaring it is best to simply use multiplication, but you could certainly use pow() if you wanted to, as demonstrated by the following code (error checks omitted for conciseness):

#include <cmath>
#include <cstdlib>
#include <cstdio>

__global__ void kernel (int ib, float fa, float fb, double da, double db)
{
    printf ("pow_float_int     = %15.8e\n", pow (fa, ib));
    printf ("pow_float_float   = %15.8e\n", pow (fa, fb));
    printf ("pow_double_int    = %23.16e\n", pow (da, ib));
    printf ("pow_double_double = %23.16e\n", pow (da, db));
}

int main (void)
{
    int ia = -5, ib = 2;
    float fa = ia, fb = ib;
    double da = ia, db = ib;

    kernel<<<1,1>>>(ib, fa, fb, da, db);
    cudaDeviceSynchronize();
    return EXIT_SUCCESS;
}

The output of the above program should look something like this:

pow_float_int     =  2.50000000e+01
pow_float_float   =  2.49999981e+01
pow_double_int    =  2.5000000000000000e+01
pow_double_double =  2.5000000000000000e+01

Upvotes: 2

Related Questions