Divide two 64-bit numbers and store the result in a float without losing precision

Question

Let's say you have a very large 64-bit value measuring time in microseconds. I want to convert that to a float, measured in seconds, which means dividing the time value by 1000000. In what order can you perform the divide and conversion without losing data?

If I perform the division first, it's still an integer, and I lose the sub-second portion of the measurement. If I convert the value to float first, the measurement gets truncated from 64-bit to 24-bits, and the result is incorrect.

Is the compiler smart enough to know what to do if you perform both operations at once? Or does this require manually breaking it up into pieces?

I understand that I may still lose precision in the final float by dividing down so far, and that is fine. I want to avoid any extra loss due to the way the conversion is performed.

xhienne · Accepted Answer

Since you mention 24 bits, I assume you are using single precision floating point numbers (float). Use double precision and you will get a 53 bit mantissa. That should be enough for a number of microseconds, otherwise use long double and you will get a 63 bit mantissa with gcc on x86-64.

Divide two 64-bit numbers and store the result in a float without losing precision

Answers (1)

Related Questions