Reputation: 242
I already know how to implement conversion to half-precision floating point using truncating (thanks to this answer). But how can I do the same conversion using rounding for nearest representable ? For example, i want 65519 to round to 0x7bff (which is 65504), not to infinity. One more example: in the linked solution 8199 will be represented by 8192, but the nearest representable for 8199 is 8200
UPD: For more example cases: I want to round integers between 32768 and 65519 to a multiple of 32, integers between 16384 and 32768 round to a multiple of 16 and so on. In this solution 8199 will be represented by 8192, but the nearest representable for 8199 is 8200
Upvotes: 4
Views: 728
Reputation: 676
You need two pieces to achieve what you want.
1. add rounding before you do the conversion
by adding:
// round the number if necessary before we do the conversion
if (manbits > 13)
absx += (2<<(manbits-13));
manbits = 0;
tmp = absx;
while (tmp)
{
tmp >>= 1;
manbits++;
}
before you do the conversion.
2. Change the clipping to infinty to > 16
by changing
if (exp + truncated > 15)
to:
if (exp + truncated > 16)
I updated the original code https://ideone.com/mWqgSP
Upvotes: 3