Reputation: 318
Is there a fast and clean way to convert int32_t (or larger) to a largest representable value in float not larger than the original value stored in int32_t?
According to standard of IEEE754 (read only on wikipedia https://en.wikipedia.org/wiki/Single-precision_floating-point_format), conversion of large integers is done via rounding to nearest multiple of a power of 2. Which power depends on the size of that value.
However I would like to know, whether it is possible to do this conversion to a "largest float not larger" instead and do that in a clean way without complicated constructs, ideally by setting some flag or by using some built-in instruction(s)?
EDIT: I have a value x_int stored in int32_t or int64_t, and I want to convert it to a float value x_float, such that for those values (mathematically, not in a programming lagnuage)
x_int>=x_float
is always true. Possible workaround for int32_t is to use double, but I am not sure about int64_t.
Upvotes: 1
Views: 350
Reputation: 35663
Behaviour may depend on compiler options in force. For example in msvc /fp:fast
sacrifices correctness for speed. If this is not what you want specify /fp:strict
or /fp:precise
(the default). On Clang, -menable-unsafe-fp-math
does something similar.
The floating point rounding mode is controlled by fesetround
.
Retrieve the rounding mode using fegetround
so you can restore it later, next use fesetround
to set the rounding mode you want (in your case FE_TOWARDZERO
if you mean smallest in magnitude, or FE_DOWNWARD
otherwise) then cast it to a float
. Finally restore the rounding mode.
inline float cast_with_mode(int32_t value, int mode){
int prevmode = fegetround();
if(prevmode == mode) return (float)value; // may be faster without this
fesetround(mode);
float result = (float)value;
fesetround(prevmode);
return result;
}
Performance wise, it may or may not be better to compare prevmode
to mode
. If it is already correct you don't need to either set it or restore it. Whether the comparison is faster or slower than the set/restore I don't know.
Example output (same on Clang and G++):
Mode Value Value ResultBits Result Value
FE_TOWARDZERO: 2147483520 0x7fffff80 => 4effffff 2147483520.000000
FE_UPWARD: 2147483520 0x7fffff80 => 4effffff 2147483520.000000
FE_TOWARDZERO: 2147483584 0x7fffffc0 => 4effffff 2147483520.000000
FE_UPWARD: 2147483584 0x7fffffc0 => 4f000000 2147483648.000000
Upvotes: 2