Bill Sellers
Bill Sellers

Reputation: 466

Finding out whether conversion from decimal to binary floating point is exact, rounding up, or rounding down

I know that values that can be exactly represented in decimal floating point are often unable to be represented as exact values in binary floating point. It is easy to demonstrate in e.g. python

a = float("0.5")
print('%.17e' % (a))
5.00000000000000000e-01
a = float("0.054")
print('%.17e' % (a))
5.39999999999999994e-02
a = float("0.055")
print('%.17e' % (a))
5.50000000000000003e-02

That's entirely to be expected and perfectly proper. But what I want to know is whether there is an easy way of finding out whether the converted value is higher or lower than the exact value (or indeed if it converts exactly). From the examples above I could obviously do something with the output string but that seems extraordinarily clumsy, and it is such a useful thing to know (for example if you are incrementing a floating point number in a loop you can use it to decide whether you are going to get the expected number of iterations or not) that I am hoping there is a more straightforward way of doing this. I'm only using python as an example here - I'd prefer a language agnostic solution.

Upvotes: 1

Views: 62

Answers (1)

chux
chux

Reputation: 153456

With some languages, under select conditions, code can control the rounding mode with the conversion as to nearest, up or down (or toward 0, or ...).

Then one can compare the conversion results of the default (typically to nearest) to up or down.

#include <fenv.h>
#include <assert.h>
#include <stdio.h>

double string_to_double(int round_dir, const char *s) {
  #pragma STDC FENV_ACCESS ON
  int save_round = fegetround();
  int setround_ok = fesetround(round_dir);
  assert(setround_ok == 0);
  char *endptr;
  double d = strtod(s, &endptr);
  fesetround(save_round);
  return d;
}

// Return 1: up, -1: down, 0: exact
int updn(const char *s) {
  double up = string_to_double(FE_UPWARD, s);
  double nr = string_to_double(FE_TONEAREST, s);
  double dn = string_to_double(FE_DOWNWARD, s);
  // Others modes: FE_TOWARDZERO

  printf("%.17e, %.17e, %.17e, ", dn, nr, up);
  if (up == dn) {
    assert(up == nr);
    return 0;
  }
  if (up > nr) return -1;
  if (dn < nr) return 1;
  return 0;  // Unexpected, unless NaN
}

int main() {
  printf("%2d\n", updn("0.5"));
  printf("%2d\n", updn("0.054"));
  printf("%2d\n", updn("0.055"));
}

Output: 1: up, -1: down, 0: exact

5.00000000000000000e-01, 5.00000000000000000e-01, 5.00000000000000000e-01,  0
5.39999999999999994e-02, 5.39999999999999994e-02, 5.40000000000000063e-02, -1
5.49999999999999933e-02, 5.50000000000000003e-02, 5.50000000000000003e-02,  1

Upvotes: 1

Related Questions