Kamil Grosicki
Kamil Grosicki

Reputation: 163

How to calculate number of mantissa bits?

I want to calculate number of mantissa bits in float and double. I know those numbers should be 23 and 52, but I have to calculate it in my program.

Upvotes: 0

Views: 2054

Answers (2)

chqrlie
chqrlie

Reputation: 144949

There is an ambiguity in number of mantissa bits: it could be

  • the number of bits needed to represent the floating point value.
  • the number of bits stored into the floating point representation.

Typically, the mantissa as stored in the IEEE floating point format does not include the initial 1 that is implied for all regular non zero numbers. Therefore the number of bits in the representation is one less that the true number of bits.

You can compute this number for the binary floating point formats in different ways:

  • some systems define manifest contants FLT_MANT_BITS, DBL_MANT_BITS and LDBL_MANT_BITS. The value is the true number of mantissa bits.
  • you can derive the number of bits from a direct computation involving FLT_EPSILON defined in <float.h>: FLT_EPSILON is the smallest float value such that 1.0f + FLT_EPSILON is different from 1.0f. The true number of mantissa is 1 - log(FLT_EPSILON) / log(2). The same formula can be used for other floating point formats.
  • you can compute the values with a loop, as illustrated in the code below.

Here is a test utility:

#include <float.h>
#include <math.h>
#include <stdio.h>

int main(void) {
    int n;
    float f = 1.0;
    for (n = 0; 1.0f + f != 1.0f; n++) {
        f /= 2;
    }
#ifdef FLT_MANT_BITS
    printf("#define FLT_MANT_BITS       %d\n", FLT_MANT_BITS);
#endif
#ifdef FLT_EPSILON
    printf("1 - log(FLT_EPSILON)/log(2) =  %g\n", 1 - log(FLT_EPSILON) / log(2));
#endif
    printf("Mantissa bits for float: %d\n", n);
    double d = 1.0;
    for (n = 0; 1.0 + d != 1.0; n++) {
        d /= 2;
    }
#ifdef DBL_MANT_BITS
    printf("#define DBL_MANT_BITS       %d\n", DBL_MANT_BITS);
#endif
#ifdef DBL_EPSILON
    printf("1 - log(DBL_EPSILON)/log(2) =  %g\n", 1 - log(DBL_EPSILON) / log(2));
#endif
    printf("Mantissa bits for double: %d\n", n);
    long double ld = 1.0;
    for (n = 0; 1.0 + ld != 1.0; n++) {
        ld /= 2;
    }
#ifdef LDBL_MANT_BITS
    printf("#define LDBL_MANT_BITS      %d\n", LDBL_MANT_BITS);
#endif
#ifdef LDBL_EPSILON
    printf("1 - log(LDBL_EPSILON)/log(2) = %g\n", 1 - log(LDBL_EPSILON) / log(2));
#endif
    printf("Mantissa bits for long double: %d\n", n);
    return 0;
}

Output on my laptop:

1 - log(FLT_EPSILON)/log(2) =  24
Mantissa bits for float:       24
1 - log(DBL_EPSILON)/log(2) =  53
Mantissa bits for double:      53
1 - log(LDBL_EPSILON)/log(2) = 64
Mantissa bits for long double: 64

Upvotes: 3

Captain Giraffe
Captain Giraffe

Reputation: 14705

There are constants you can use defined in the header <cfloat>

See FLT_MANT_DIG for example.

Upvotes: 5

Related Questions