Reputation: 149

Floating point limits code not producing correct results

I am racking my brain trying to figure out why this code does not get the right result. I am looking for the hexadecimal representations of the floating point positive and negative overflow/underflow levels. The code is based off this site and a Wikipedia entry:

7f7f ffff ≈ 3.4028234 × 10³⁸ (max single precision) -- from wikipedia entry, corresponds to positive overflow

Here's the code:

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <cmath>

using namespace std;

int main(void) {

    float two = 2;
    float twentyThree = 23;
    float one27 = 127;
    float one49 = 149;


    float posOverflow, negOverflow, posUnderflow, negUnderflow;

    posOverflow = two - (pow(two, -twentyThree) * pow(two, one27));
    negOverflow = -(two - (pow(two, one27) * pow(two, one27)));


    negUnderflow = -pow(two, -one49);
    posUnderflow = pow(two, -one49);


    cout << "Positive overflow occurs when value greater than: " << hex << *(int*)&posOverflow << endl;


    cout << "Neg overflow occurs when value less than: " << hex << *(int*)&negOverflow << endl;


    cout << "Positive underflow occurs when value greater than: " << hex << *(int*)&posUnderflow << endl;


    cout << "Neg overflow occurs when value greater than: " << hex << *(int*)&negUnderflow << endl;

}

The output is:

Positive overflow occurs when value greater than: f3800000 Neg overflow occurs when value less than: 7f800000 Positive underflow occurs when value greater than: 1 Neg overflow occurs when value greater than: 80000001

To get the hexadecimal representation of the floating point, I am using a method described here:

Why isn't the code working? I know it'll work if positive overflow = 7f7f ffff.

Upvotes: 3

Answers (3)

Potatoswatter

Reputation: 137940

Floating point loses precision as the number gets bigger. A number of the magnitude 2¹²⁷ does not change when you add 2 to it.

Other than that, I'm not really following your code. Using words to spell out numbers makes it hard for me to read.

Here is the standard way to get the floating-point limits of your machine:

#include <limits>
#include <iostream>
#include <iomanip>

std::ostream &show_float( std::ostream &s, float f ) {
    s << f << " = ";
    std::ostream s_hex( s.rdbuf() );
    s_hex << std::hex << std::setfill( '0' );
    for ( char const *c = reinterpret_cast< char const * >( & f );
          c != reinterpret_cast< char const * >( & f + 1 );
          ++ c ) {
        s_hex << std::setw( 2 ) << ( static_cast< unsigned int >( * c ) & 0xff );
    }
    return s;
}

int main() {
    std::cout << std::hex;
    std::cout << "Positive overflow occurs when value greater than: ";
    show_float( std::cout, std::numeric_limits< float >::max() ) << '\n';
    std::cout << "Neg overflow occurs when value less than: ";
    show_float( std::cout, - std::numeric_limits< float >::max() ) << '\n';
    std::cout << "Positive underflow occurs when value less than: ";
    show_float( std::cout, std::numeric_limits< float >::denormal_min() ) << '\n';
    std::cout << "Neg underflow occurs when value greater than: ";
    show_float( std::cout, - std::numeric_limits< float >::min() ) << '\n';
}

output:

Positive overflow occurs when value greater than: 3.40282e+38 = ffff7f7f
Neg overflow occurs when value less than: -3.40282e+38 = ffff7fff
Positive underflow occurs when value less than: 1.17549e-38 = 00008000
Neg underflow occurs when value greater than: -1.17549e-38 = 00008080

The output depends on the endianness of the machine. Here the bytes are reversed due to little-endian order.

Note, "underflow" in this case isn't a catastrophic zero result, but just denormalization which gradually reduces precision. (It may be catastrophic to performance, though.) You might also check numeric_limits< float >::denorm_min() which produces 1.4013e-45 = 01000000.

Upvotes: 2

John Calsbeek

Reputation: 36547

Your expression for the highest representable positive float is wrong. The page you linked uses (2-pow(2, -23)) * pow(2, 127), and you have 2 - (pow(2, -23) * pow(2, 127)). Similarly for the smallest representable negative float.

Your underflow expressions look correct, however, and so do the hexadecimal outputs for them.

Note that posOverflow and negOverflow are simply +FLT_MAX and -FLT_MAX. But note that your posUnderflow and negUnderflow are actually smaller than FLT_MIN(because they are denormal, and FLT_MIN is the smallest positive normal float).

Upvotes: 3

aib

Reputation: 47011

Your code assumes integers have the same size as a float (so do all but a few of the posts on the page you've linked, btw.) You probably want something along the lines of:

for (size_t s = 0; s < sizeof(myVar); ++s) {
    unsigned char *byte = reinterpret_cast<unsigned char*>(myVar)[s];
    //sth byte is byte
}

that is, something akin to the templated code on that page.

Your compiler may not be using those specific IEEE 754 types. You'll need to check its documentation.

Also, consider using std::numeric_limits<float>.min()/max() or cfloat FLT_ constants for determining some of those values.

Upvotes: 1

Floating point limits code not producing correct results

Answers (3)

Related Questions