Jwan622
Jwan622

Reputation: 11639

Floating point number calculation using a 32 bit word

I am reading my computer organization fifth edition book by Patterson and I am confused by these two pages of text. First page:

enter image description here

Is the first word equal to 0.5 in decimal? I see that the sign is 0, the exponent is -1, and the fraction is 0 with an implied 1 in the significand. So 1.0_two * 2^-1 = 0.5? Is that right?

Why is 1.0 * 2^1 the "smaller binary number?". Isn't the second word bigger? It has a 0 in the sign, a 1 in the exponent, and an implied 1 in the significand = 1.0 * 2^1 = 2? Is that right?

I don't understand the paragraph that says:

The desirable notation must therefore represent the most negative exponent as 00 ... 00_two and the most positive as 11 ... 11_two. This convention is called biased notation, with the bias being the number subtracted from the normal, unsigned representation to determine the real value.

Upvotes: 1

Views: 416

Answers (1)

Chris Dodd
Chris Dodd

Reputation: 126203

If you look at them just as binary numbers, the first one is 0x7f800000 while the second is 0x00800000, so the second is a smaller binary number even though it represents a larger floating point number. So using a binary comparison or sort would do the wrong thing.

So instead the biased representation for the exponent is used, which means the binary value for 0.5 is 0x3f000000 and the binary value for 2.0 is 0x40000000, and the binary comparison "works" for comparing and sorting floating point numbers.

The problem being that this is still a sign+magnitude representation, so you need a sign+magnitude binary comparison, while most hardware uses 2s-complement. So you still end up needing special floating point comparison instructions/hardware.

Upvotes: 1

Related Questions