Divide number of bits exponent and mantissa

Question

By the standards, if we want to convert a float into binary in 32 bits, we use 1 bit for the sign, 8 for the exponent and the other 23 bits for the mantissa.

Let’s say we don’t want to convert in 32 bits, but in 15, 23 or any other number. Is there a rule or method to « divide » the number of bits given to convert well ?

For example : if we say we want to convert a given float number in 15 bits, how much bits we need to the exponent and the mantissa ?

Ben · Accepted Answer

By the standards, there is "Half precision" floating point, which is 16 bits in size.

The standard is IEEE 754:

https://en.wikipedia.org/wiki/IEEE_754

It defines various different formats but not 15, 23 and so on.

If you are defining your own format, essentially it is a design decision how many bits to use for the exponent.

The standard defines a 16-bit format (half precision) which uses 10 bits for the mantissa, (for an effective 3 decimal places) and 5 bits for the exponent allowing range of +-65500.

https://en.wikipedia.org/wiki/Half-precision_floating-point_format

Here's an example 16-bit format which uses a different number of bits for the mantissa. It has effectively only 2 decimal places precision but covers essentially the same range of values as Single precision. This makes it useful for different purposes to Half precision:

https://en.wikipedia.org/wiki/Bfloat16_floating-point_format

And here's an example of yet another 16-bit format, and also an 8 bit format:

http://www.toves.org/books/float/

And here's an example of 11 and 10 bit floating point numbers with no sign bit. These are intended for storage of images only so they don't need negative values. They use 5 bits for the mantissa to make it easy to convert to and from Half precision, which is what is used internally by graphics cards:

https://bartwronski.com/2017/04/02/small-float-formats-r11g11b10f-precision/

Divide number of bits exponent and mantissa

Answers (1)

Related Questions