chux
chux

Reputation: 154070

`strtof()` conversion error by more than 0.5 ULP

Why, with strtof() "3.40282356779733650000e38" unexpectantly converted to infinity even though it is within 0.5 ULP of FLT_MAX?


FLT_MAX (float32) is 0x1.fffffep+127 or about 3.4028234663852885981170e+38.

1/2 ULP above FLT_MAX is 0x1.ffffffp+127 or about 3.4028235677973366163754e+38, so I expected any decimal text below this and the lower FLT_MAX to convert to FLT_MAX when in "round to nearest" mode.

This works as decimal text increases from FLT_MAX to about 3.4028235677973388642700e38, yet for decimal text values about above that like "3.40282356779733650000e38", the conversion result is infinity.

Follows is code that reveals the issue. It gently creeps up a decimal text string, looking for the value in which conversion changes to infinity.
Your results may differ as not all C implementations use the same floating point.

#include <assert.h>
#include <float.h>
#include <stdio.h>
#include <stdlib.h>

void bar(unsigned n) {
  char buf[100];
  assert (n < 90);
  int len = sprintf(buf, "%.*fe%d", n+1, 0.0, FLT_MAX_10_EXP);
  puts(buf);
  printf("%-*s   %-*s       %s\n", len, "string", n+3, "float", "double");
  float g = 0;
  for (unsigned i = 0; i < n; i++) {
    for (int digit = '1'; digit <= '9'; digit++) {
      unsigned offset = i ? 1+i : i;
      buf[offset]++;
      errno = 0;
      float f = strtof(buf, 0);
      if (errno) {
        buf[offset]--;
        break;
      }
      g = f;
    }
    printf("\"%s\" %.*e %a\n", buf, n + 3, g, atof(buf));
  }
  double delta = FLT_MAX - nextafterf(FLT_MAX, 0);
  double flt_max_ulp_d2 = FLT_MAX + delta/2.0;
  printf(" %.*e %a FLT_MAX + 1/2 ULP - 1 dULP\n", n + 3, nextafter(flt_max_ulp_d2,0),nextafter(flt_max_ulp_d2,0));
  printf(" %.*e %a FLT_MAX + 1/2 ULP\n", n + 3, flt_max_ulp_d2,flt_max_ulp_d2);
  printf(" %.*e %a FLT_MAX\n", n + 3, FLT_MAX, FLT_MAX);
  printf(" 1 23456789 123456789 123456789\n");
  printf("FLT_ROUNDS %d  (0: toward zero, 1: to nearest)\n", FLT_ROUNDS);
}

int main() {
  printf("%a %.20e\n", FLT_MAX, FLT_MAX);
  printf("%a\n", strtof("3.40282356779733650000e38", 0));
  printf("%a\n", strtod("3.40282356779733650000e38", 0));
  printf("%a\n", strtod("3.4028235677973366163754e+3", 0));
  bar(19);
}

Output

0x1.fffffep+127 3.40282346638528859812e+38
inf
0x1.ffffffp+127
0x1.a95a5aaada733p+11
0.00000000000000000000e38
string                      float                        double
"3.00000000000000000000e38" 3.0000000054977557577780e+38 0x1.c363cbf21f28ap+127
"3.40000000000000000000e38" 3.3999999521443642490773e+38 0x1.ff933c78cdfadp+127
"3.40000000000000000000e38" 3.3999999521443642490773e+38 0x1.ff933c78cdfadp+127
"3.40200000000000000000e38" 3.4020000005553803402978e+38 0x1.ffe045fe9918p+127
"3.40280000000000000000e38" 3.4027999387901483621794e+38 0x1.ffff169a83f08p+127
"3.40282000000000000000e38" 3.4028200183756559773331e+38 0x1.ffffdbd19d02cp+127
"3.40282300000000000000e38" 3.4028230607370965250836e+38 0x1.fffff966ad924p+127
"3.40282350000000000000e38" 3.4028234663852885981170e+38 0x1.fffffe54daff8p+127
"3.40282356000000000000e38" 3.4028234663852885981170e+38 0x1.fffffeec5116ep+127
"3.40282356700000000000e38" 3.4028234663852885981170e+38 0x1.fffffefdfcbbcp+127
"3.40282356770000000000e38" 3.4028234663852885981170e+38 0x1.fffffeffc119p+127
"3.40282356779000000000e38" 3.4028234663852885981170e+38 0x1.fffffefffb424p+127
"3.40282356779700000000e38" 3.4028234663852885981170e+38 0x1.fffffeffffc85p+127
"3.40282356779730000000e38" 3.4028234663852885981170e+38 0x1.fffffefffff9fp+127
"3.40282356779733000000e38" 3.4028234663852885981170e+38 0x1.fffffefffffeep+127
"3.40282356779733600000e38" 3.4028234663852885981170e+38 0x1.fffffeffffffep+127

"3.40282356779733640000e38" 3.4028234663852885981170e+38 0x1.fffffefffffffp+127 <-- Actual
"3.40282356779733660000e38" 3.4028234663852885981170e+38 ...                    <-- Expected

"3.40282356779733642000e38" 3.4028234663852885981170e+38 0x1.fffffefffffffp+127
"3.40282356779733642700e38" 3.4028234663852885981170e+38 0x1.fffffefffffffp+127
 3.4028235677973362385861e+38 0x1.fffffefffffffp+127 FLT_MAX + 1/2 ULP - 1 dULP
 3.4028235677973366163754e+38 0x1.ffffffp+127 FLT_MAX + 1/2 ULP
 3.4028234663852885981170e+38 0x1.fffffep+127 FLT_MAX
 1 23456789 123456789 123456789
FLT_ROUNDS 1  (0: toward zero, 1: to nearest)

Notes: GNU C11 (GCC) version 11.3.0 (x86_64-pc-cygwin) compiled by GNU C version 11.3.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

[Edit] The exact value of FLT_MAX + 1/2 ULP: 0x1.ffffffp+127 340282356779733661637539395458142568448.0

I stumbled on this problem today when trying to determine the maximum decimal text passed to strtof() that returned a finite float.

Upvotes: 7

Views: 131

Answers (1)

chux
chux

Reputation: 154070

This is a Can I answer my own question? answer. Other answers are welcomed.

Why, with strtof() "3.40282356779733650000e38" unexpectantly converted to infinity even though it is within 0.5 ULP of FLT_MAX?

Certainly double rounding.
"Double" here refers to doing something twice, not the type double.

Let 1/2 of a float ULP above FLT_MAX is 0x1.ffffffp+127 or about 3.4028235677973366163754e+38 is called threshold.

About 3.4028235673364274808e38 is one half of a double ULP below threshold. Apparently values like "3.40282356779733650000e38" prematurely rounds as a double to threshold. threshold, as a float, is half-way between FLT_MAX and the next larger float (if the encoding was extended). Being a half-way tie, it rounds to the "even" value - the larger one in this case. Since the next larger float is beyond the max encodable finite value, the result is infinity.

Conclusions

  • A better strtof() would correctly handle this corner case.

  • Instead, it is reasonable to consider decimal places past FLT_DECIMAL_DIG + 3 (see following) in strtof() as noise.


In an alternative strtof() implementation, IEEE_754 allows such decimal text conversions to treat all the decimal digits passed a certain significance as zero. This, thus allowing conversions to the 2nd closest float when near the 1/2 way point of 2 floats. With common float, that significance is FLT_DECIMAL_DIG + 3 or 12 decimal places. That is not used here as decimals in the 19th place affect the result.

Upvotes: 4

Related Questions