Raymond Hettinger
Raymond Hettinger

Reputation: 226296

Some C Floating Point Constants Don't Make Sense

The constants in <float.h> for Apple clang version 12.0.0 (clang-1200.0.32.2) don't seem to make sense.

DBL_MIN_EXP is -1021 and DBL_MAX_EXP is 1024. However, that doesn't match what wikipedia says, "exponents range from −1022 to +1023, ..."

Also DBL_MIN_EXP seems inconsistent with DBL_MIN which is 2.2250738585072014e-308 which is equal to 2⁻¹⁰²² sometimes written as 0x1.0000000000000p-1022. So, we have an exponent smaller than the minimum -1021.

Likewise, DBL_MIN_10_EXP is -307 which doesn't sense given that DBL_MIN has an exponent of e-308.

The double DBL_MAX_EXP value of 1024 overflows when used in real code. For example, ldexp(1.0, 1024) gives inf.

Here's my C code:

#include <float.h>
#include <stdio.h>
#include <math.h>

#define SHOW_DOUBLE(s)   printf("%.17lg \t%s\n", s, #s);
#define SHOW_INT(s)      printf("%d \t%s\n", s, #s);

int
main()
{
    SHOW_DOUBLE(DBL_MAX);
    SHOW_DOUBLE(DBL_MIN);
    SHOW_DOUBLE(DBL_EPSILON);
    SHOW_INT(DBL_MAX_EXP);
    SHOW_INT(DBL_MAX_10_EXP);
    SHOW_INT(DBL_MIN_EXP);
    SHOW_INT(DBL_MIN_10_EXP);
    SHOW_INT(DBL_DIG);
    SHOW_INT(DBL_MANT_DIG);
    SHOW_INT(FLT_RADIX);
    SHOW_INT(FLT_ROUNDS);
    printf("%lf\n", ldexp(1.0, 1024));
    return 0;
}

And here is the output:

1.7976931348623157e+308 DBL_MAX
2.2250738585072014e-308 DBL_MIN
2.2204460492503131e-16  DBL_EPSILON
1024                    DBL_MAX_EXP
308                     DBL_MAX_10_EXP
-1021                   DBL_MIN_EXP
-307                    DBL_MIN_10_EXP
15                      DBL_DIG
53                      DBL_MANT_DIG
2                       FLT_RADIX
1                       FLT_ROUNDS
inf

Upvotes: 8

Views: 340

Answers (2)

G&#233;ry Ogam
G&#233;ry Ogam

Reputation: 8027

I asked myself the same question and realized that this stems from the fact that IEEE 754 and C use two different normalized forms:

  • IEEE 754 uses a significand between 1 (inclusive) and 2 (exclusive) for normal numbers:

x = sign × significand × 2exponent, encoded as S|E|F where sign = (−1)S; significand =

  • 1.F if EminEEmax (normal number),
  • 0 if E = Emin − 1 and F = 0 (zero),
  • 0.F if E = Emin − 1 and F ≠ 0 (subnormal number),
  • ∞ if E = Emax + 1 and F = 0 (infinity),
  • NaN if E = Emax + 1 and F ≠ 0 (not a number);

exponent = max(E, Emin) − exponent bias.

  • C uses a significand′ between 0.5 (inclusive) and 1 (exclusive) for normal numbers:

x = sign′ × significand′ × 2exponent′, encoded as S|E|F where sign′ = (−1)S; significand′ =

  • 0.1F if EminEEmax (normal number),
  • 0 if E = Emin − 1 and F = 0 (zero),
  • 0.0F if E = Emin − 1 and F ≠ 0 (subnormal number),
  • ∞ if E = Emax + 1 and F = 0 (infinity),
  • NaN if E = Emax + 1 and F ≠ 0 (not a number);

exponent′ = max(E + 1, Emin + 1) − exponent bias.

Since x = sign × significand × 2exponent = sign′ × significand′ × 2exponent′, the following relations between the IEEE 754 normalized form and the C normalized form hold:

  • sign′ = sign;
  • significand′ = significand/2;
  • exponent′ = exponent + 1.

In particular, in binary64 format that is why exponentmin = −1022 in IEEE 754 normalized form and exponent′min = −1021 in C normalized form (DBL_MIN_EXP), and exponentmax = 1023 in IEEE 754 normalized form and exponent′max = 1024 in C normalized form (DBL_MAX_EXP).

In Python:

  • the IEEE 754 normalized significand and exponent can be extracted with:
math.ldexp(x, -math.floor(math.log2(x))), math.floor(math.log2(x))
  • the C normalized significand′ and exponent′ can be extracted with:
math.frexp(x)

For instance:

>>> import math
>>> x = 12
>>> math.ldexp(x, -math.floor(math.log2(x))), math.floor(math.log2(x))
(1.5, 3)
>>> math.frexp(12)
(0.75, 4)

Upvotes: 1

R.. GitHub STOP HELPING ICE
R.. GitHub STOP HELPING ICE

Reputation: 215261

The off-by-one is part of the spec. From 5.2.4.2.2 Characteristics of floating types <float.h>, ¶11,

...

  • minimum negative integer such that FLT_RADIX raised to one less than that power is a normalized floating-point number, emin
    • FLT_MIN_EXP
    • DBL_MIN_EXP
    • LDBL_MIN_EXP

...

  • maximum integer such that FLT_RADIX raised to one less than that power is a representable finite floating-point number, emax
    • FLT_MAX_EXP
    • DBL_MAX_EXP
    • LDBL_MAX_EXP

Emphasis on one less than is mine.

Upvotes: 9

Related Questions