Reputation: 127
I'm currently creating an emulator for a hypothetical CPU. The CPU has 16 8-bit registers which can either represent a signed byte or an 8-bit float.
Both SByte and FByte contain a byte member variable.
I currently have worked out how to get the real value of the floating byte using the following:
FByte = SEEEEMMM
value = (-1)^S + 1.M^(E-7)
S = Sign bit
M = Mantissa
E = Exponent
How would I go about converted a given double value (e.g. -3.562) into a float representation (as SEEEEMMM).
Thanks in advance!
EDIT: I currently know how to do this in theory - write it in base-2 scientific notation and binary representation but to do it that way in my program would require using String manipulation whereas I'd rather keep String intermediaries out of it.
Upvotes: 1
Views: 2565
Reputation: 31689
The basic plan for converting a double
to your float representation should be:
double
to a long
using doubleToLongBits
. This gives the IEEE 754 representation of the double
.double
by using bit operations on the doubleToLongBits
result. Bit 63 is the sign bit. Bits 62-52 are the biased exponent. Bits 51-0 are the mantissa.0b111
and you decide you need to round up, write your code very carefully, because now the mantissa goes from [1].111
to [1]0.000
, which means you will need to shift one to the right (to get [1].000
), which will impact the resulting exponent. (I'm using [1]
to indicate an implied 1 bit in the mantissa.)double
, and 7 appears to be the bias of your floating-point type. The result will be the new exponent, but it could be out of range. [Also, you might have to add another 1 to the new exponent if you round up, as noted above.]double
. (I'm assuming that you meant the formula to be (-1)^S * 1.M^(E-7)
, with *
instead of +
.)See https://en.wikipedia.org/wiki/IEEE_floating_point for more information about the format of a double
.
Upvotes: 4