everchav
everchav

Reputation: 31

Convert integer in a new floating point format

This code is intended to convert a signed 16-bit integer to a new floating point format (similar to the normal IEEE 754 floating point format). I unterstand the regular IEEE 754 floating point format, but i don't understand how this code works and how this floating point format looks like. I would be grateful for some insights into what the idea of the code is respectively how many bits are used for representing the significand and how many bits are used for representing the exponent in this new format.

#include <stdint.h>
uint32_t short2fp (int16_t inp)
{
  int x, f, i;
  if (inp == 0)
    {
      return 0;
    }
  else if (inp < 0)
    {
      i = -inp;
      x = 191;
    }
  else
    {
      i = inp;
      x = 63;
    }
  for (f = i; f > 1; f >>= 1)
    x++;
  for (f = i; f < 0x8000; f <<= 1);
  return (x * 0x8000 + f - 0x8000);
}

Upvotes: 0

Views: 547

Answers (3)

njuffa
njuffa

Reputation: 26085

As a comment suggested, I would use a small number of strategically selected test cases to reverse engineer the format. The following assumes an IEEE-754-like binary floating-point format using sign-magnitude encoding with a sign bit, exponent bits, and significand (mantissa) bits.

short2fp (1) = 001f8000 while short2fp (-1) = 005f8000. The exclusive OR of these is 0x00400000 which means the sign bit is in bit 22 and this floating-point format comprises 23 bits.

short2fp (1) = 001f8000, short2fp (2) = 00200000, and short2fp (4) = 00208000. The difference between consecutive values is 0x00008000 so the least significant bit of the exponent field is bit 15, the exponent field comprises 7 bits, and the exponent is biased by (0x001f8000 >> 15) = 0x3F = 63.

This leaves the least significant 15 bits for the significand. We can see from short2fp (2) = 00200000 that the integer bit of the significand (mantissa) is not stored, that is, it is implicit as in IEEE-754 formats like binary32 or binary64.

Upvotes: 3

Eric Postpischil
Eric Postpischil

Reputation: 222274

The two statements x = 191; and x = 63; set x to either 1•128 + 63 or 0•128 + 63, according to whether the number is negative or positive. Therefore 128 (27) has the sign bit at this point. As x is later multiplied by 0x8000 (215), the sign bit is 222 in the result.

These statements also initialize the exponent to 0, which is encoded as 63 due to a bias of 63. This follows the IEEE-754 pattern of using a bias of 2n−1−1 for an exponent field of n bits. (The “single” format has eight exponent bits and a bias of 27−1 = 127, and the “double” format has 11 exponent bits and a bias of 210−1 = 1023.) Thus we expect an exponent field of 7 bits, with bias 26−1 = 63.

This loop:

for (f = i; f > 1; f >>= 1)
    x++;

detects the magnitude of i (the absolute value of the input), adding one to the exponent for each power of two that f is detected to exceed. For example, if the input is 4, 5, 6, or 7, the loop executes two times, adding two to x and reducing f to 1, at which point the loop stops. This confirms the exponent bias; if i is 1, x is left as is, so the initial value of 63 corresponds to an exponent of 0 and a represented value of 20 = 1.

The loop for (f = i; f < 0x8000; f <<= 1); scales f in the opposite direction, moving its leading bit to be in the 0x8000 position.

In return (x * 0x8000 + f - 0x8000);, x * 0x8000 moves the sign bit and exponent field from their initial positions (bit 7 and bits 6 to 0) to their final positions (bit 22 and bits 21 to 15). f - 0x8000 removes the leading bit from f, giving the trailing bits of the significand. This is then added to the final value, forming the primary encoding of the significand in bits 14 to 0.

Thus the format has the sign bit in bit 22, exponent bits in bits 21 to 15 with a bias of 63, and the trailing significand bits in bits 14 to 0.

The format could encode subnormal numbers, infinities, and NaNs in the usual way, but this is not discernible from the code shown, as it encodes only integers in the normal range.

Upvotes: 3

Luca Polito
Luca Polito

Reputation: 2862

This couple of tricks should help you recognize the parameters (exponent's size and mantissa's size) of a custom floating-point format:

  1. First of all, how many bits is this float number long?

    We know that the sign bit is the highest bit set in any negative float number. If we calculate short2fp(-1) we obtain 0b10111111000000000000000, that is a 23-bit number. Therefore, this custom float format is a 23-bit float.

  2. If we want to know the exponent's and mantissa's sizes, we can convert the number 3, because this will set both the highest exponent's bit and the highest mantissa's bit. If we do short2fp(3), we obtain 0b01000000100000000000000, and if we split this number we get 0 1000000 100000000000000: the first bit is the sign, then we have 7 bits of exponent, and finally 15 bits of mantissa.

Conclusion:

Float format size: 23 bits
Exponent size:     7 bits
Mantissa size:     15 bits

NOTE: this conclusion may be wrong for a different number of reasons (e.g.: float format particularly different from IEEE754 ones, short2fp() function not working properly, too much coffee this morning, etc.), but in general this works for every binary floating-point format defined by IEEE754 (binary16, binary32, binary64, etc.) so I'm confident this works for your custom float format too.

P.S.: the short2fp() function is written very poorly, you may try improve its clearness if you want to investigate the inner workings of the function.

Upvotes: 3

Related Questions