envy grunt
envy grunt

Reputation: 242

Convert integer to half-precision floating point format using round-to-even

I already know how to implement conversion to half-precision floating point using truncating (thanks to this answer). But how can I do the same conversion using rounding for nearest representable ? For example, i want 65519 to round to 0x7bff (which is 65504), not to infinity. One more example: in the linked solution 8199 will be represented by 8192, but the nearest representable for 8199 is 8200

UPD: For more example cases: I want to round integers between 32768 and 65519 to a multiple of 32, integers between 16384 and 32768 round to a multiple of 16 and so on. In this solution 8199 will be represented by 8192, but the nearest representable for 8199 is 8200

Upvotes: 4

Views: 728

Answers (1)

hko
hko

Reputation: 676

You need two pieces to achieve what you want.

1. add rounding before you do the conversion

  by adding:

  // round the number if necessary before we do the conversion
  if (manbits > 13)
    absx += (2<<(manbits-13));

  manbits = 0;
  tmp = absx;
  while (tmp)
  {
    tmp >>= 1;
    manbits++;
  }

  before you do the conversion.

2. Change the clipping to infinty to > 16

  by changing

  if (exp + truncated > 15)

  to:

  if (exp + truncated > 16)

I updated the original code https://ideone.com/mWqgSP

Upvotes: 3

Related Questions