silverscania
silverscania

Reputation: 707

Why does adding 1 to numeric_limits<float>::min() return 1?

How come subtracting 1 from float max returns a sensible value, but adding 1 to float min returns 1?

I thought that if you added or subtracted a value smaller than the epsilon for that particular magnitude, then nothing would happen and there would be no increase or decrease.

Here is the code I compiled with g++ with no flags and ran on x86_64.

#include <limits>
#include <iostream>

int main() {
    float min = std::numeric_limits<float>::min() + 1;
    float max = std::numeric_limits<float>::max() - 1;

    std::cout << min << std::endl << max << std::endl;
    return 0;
}

Outputs this:

1
3.40282e+38

I would expect it to output this:

-3.40282e+38
 3.40282e+38

Upvotes: 0

Views: 360

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 364170

min is the smallest-magnitude positive normalized float, a very tiny positive number (about 1.17549e-38), not a negative number with large magnitude. Notice that the - is in the exponent, and this is scientific notation. e-38 means 38 zeros after the decimal point. Try it out on https://www.h-schmidt.net/FloatConverter/IEEE754.html to play with the bits in a binary float.

std::numeric_limits<float>::min() is the minimum magnitude normalized float, not -max. CppReference even has a note about this possibly being surprising.

Do you know why that was picked to be the value for min() rather than the lowest negative value? Seems to be an outlier with regards to all the other types.

Some of the sophistication in numeric_limits<T> like lowest and denorm_min is new in C++11. Most of the choice of what to define mostly followed C. Historical C valued economy and didn't define a lot of different names. (Smaller is better on ancient computers, and also less stuff in the global namespace which is all C had access to.)

Float types are normally1 symmetric around 0 (sign/magnitude representation), so C didn't have a separate named constant for the most-negative float / double / long double. Just FLT_MAX and FLT_MIN CPP macros. C doesn't have templates, so you know when you're writing FP code and can use a - on the appropriate constant if necessary.

If you're only going to have a few named constants, the three most interesting ones are:

  • FLT_EPSILON tells you about the available precision (mantissa bits): nextafter(1.0, +INF) - 1.0
  • FLT_MIN / FLT_MAX min (normalized) and max magnitudes of finite floats. This depends mostly on how many exponent bits a float has.

    They're not quite symmetric around 1.0 for 2 reasons: all-ones mantissa in FLT_MAX, and gradual underflow (subnormals) taking up the lowest exponent-field (0 with bias), but FLT_MIN ignoring subnormals. FLT_MIN * FLT_MAX is about 3.99999976 for IEEE754 binary32 float. (You normally want to avoid subnormals for performance reasons, and so you have room for gradual underflow, so it makes sense that FLT_MIN isn't denorm_min)

(Fun fact: 0.0 is a special case of a subnormal: exponent field = 0 implying a mantissa of 0.xxx instead of 1.xxx).

Footnote 1: CppReference points out that C++11 std::numeric_limits<T>::lowest() could be different from -max for 3rd-party FP types, but isn't for standard C++ FP types.

lowest is what you wanted: the most-negative finite value. It's consistent across integer and FP types as being the most-negative value, so for example you could use it as an initializer for a templated search loop that uses std::min to find the lowest value in an array.

C++11 also introduced denorm_min, the minimum positive subnormal aka denormal value for FP types. In IEEE754, the object representation has all bits 0 except for a 1 in the low bit of the mantissa.


The float result for 1.0 + 1.17549e-38 (after rounding to the nearest float) is exactly 1.0. min is lower than std::numeric_limits<float>::epsilon so the entire change is lost to rounding error when added to 1.0.

So even if you did print the float with full precision (or as a hex float), it would be 1.0. But you're just printing with the default formatting for cout which rounds to some limited precision, like 6 decimal digits. https://en.cppreference.com/w/cpp/io/manip/setprecision

(An earlier version of the question included the numeric value of min ~= 1.17549e-38; this answer started out addressing that mixup and I haven't bothered to fully rewrite those parts).

Upvotes: 6

JohnFilleau
JohnFilleau

Reputation: 4288

std::numeric_limits<float>::min() returns the smallest normalized positive value. To get the value that has no value lower than it, use std::numeric_limits<float>::lowest().

https://en.cppreference.com/w/cpp/types/numeric_limits/min

Upvotes: 6

Related Questions