Reputation: 11
I am trying to convert a decimal to a floating point integer on using 32 bit registers. I have to do this by hand (pencil and paper) so far my number is
1.11010110111100110100010011(base 2) x 2^26
Now I know that the mantissa can only store 2^23 bites so I need to show what it would look like using rounding and without rounding. My question is what determines rounding? I know truncation will result in this
1.11010110111100110100010(base 2) x 2^23
does rounding just look to the bit to the right and round up to 1 if it equals a 1 and down to 0 if it equals a zero?
What if the number was
1.11010110111100110100010111(base 2) x 2^26 where there is a one to the right?
What if the bit at 2^3 was a 1 and the bit at 2^2 (to the right) was a 1 like in this example
1.11010110111100110100011111(base 2) x 2^26
Thanks I am just a little unclear about rounding at this stage.
Upvotes: 1
Views: 3585
Reputation: 28806
Rounding is generally done to the nearest more significant digit available. But if the value is exactly between those, i.e. if the highest bit you want to get rid of is 1 and the others are 0, there are several so called tie breaking rules:
Which rule is applied is something that must be defined. AFAIK, most FPUs use banker's rounding as a default.
In our case, you throw away 3 binary digits. 000 are simply truncated; 001-011 always round down; 101-111 always round up and 100 invokes the tie breaking rules. If the result of these rules is to round up, you add one least significant bit to the result and, if necessary, shift accordingly.
In your first case, you simply truncate the bits, since they are below 100, but if this is the value
1.11010110111100110100011111
and you want to remove 3 bits, it is first truncated to
1.11010110111100110100011
but because the bits you threw away were 111, you round up, so you add 1 bit, and it becomes
1.11010110111100110100100
IOW, the lowest bits 011 become 100
Upvotes: 1
Reputation: 212979
Truncation and rounding of binary numbers work much like they do for decimals. In theory you would need to look at as many bits as are available to do "correct" rounding, but in practice most hardware implementations use 1 or 2 bits to the right to determine whether to round up.
Upvotes: 1