Reputation: 33
I am trying to figure out how to store a floating point into MASM and understand how is it stored. For example, if I have the number:
1234.56_10 ;Base 10
Where I have to store it and convert it into base 10. What would I have to do to store the .56
value? I was thinking of storing 1234
first, converting it into base 10; then storing .56
, also convert it, and to later add them both together.
But the problem I have is with storing .56
. I don't know how to store it, and I do not know how is it stored into memory. Is it stored as an ASCII character? Or is it stored differently?
Upvotes: 2
Views: 2808
Reputation: 244732
There are basically three different floating-point types in MASM:
REAL4
This is the equivalent of C's float
type, and it is a single-precision floating point value, stored in 4 bytes. It has a range of ±1.7×1038, with 6 significant digits.
The format is (from high bit to low bit): sign bit, 8-bit exponent, 23-bit mantissa. (The leading 1 is implicit.)
REAL8
This is the equivalent of C's double
type, and it is a double-precision floating point value, stored in 8 bytes. It has a range of ±1×10308, with 14 significant digits.
The format is (from high bit to low bit): sign bit, 11-bit exponent, 52-bit mantissa. (The leading 1 is implicit.)
TBYTE
This is what the x87 FPU natively stores in its registers (which are implemented as a stack), and is often equivalent to C's long double
type (if supported by your compiler).
It is a 10-byte value (which is where the type name comes from), with a range of ±104932, with 18 significant digits.
The format is (from high bit to low bit): sign bit, 15-bit exponent, 64-bit mantissa, explicit leading 1.
(The images above are taken from the linked Wikipedia articles, licensed via CC BY-SA 3.0.)
Generally, you will want to use one of the latter two formats, REAL8
or TBYTE
, because you always want to use as much precision as possible. Plus, there is no significant speed penalty for using higher-precision types on the x87. (The only way you'll see a speed-up is if you limit the processor to working only with single-precision types, and that takes work and is very likely not what you want because of the great loss of precision.)
So let's look at the 64-bit REAL8
format as an example.
In the real world, floating-point values are often expressed in "scientific notation", which is a base-10 format—e.g., 1.81×103. In computing, a binary-based format is used: 1.0110101×27. Essentially, the notation is: mantissa×2exponent. (The mantissa is also known as the "fraction" portion.)
As always, the number of bits used to store each component determines the precision of that component. In the 64-bit REAL8
format, 52 bits are devoted to storing the mantissa (actually, 53 bits, since there is 1 implied bit). This gives you approximately 15 decimal digits worth of precision.
Preceding the mantissa's 52 bits are 11 bits used to store the exponent. The exponent field is biased to the middle of the available range, such that negative exponents are effectively smaller than positive exponents. For REAL8
values, the exponent bits have a bias of 0x3FF.
Finally, the highest-order bit is a sign bit. It is set to 0 if the value is positive, or 1 if the value is negative.
So, the value of +1.0 would have the following representation in REAL8
format:
0 01111111111 0000000000000000000000000000000000000000000000000000
^ |----^----| |--------------------------^-----------------------|
| | |
| exponent mantissa
|
sign bit
Or, in a simpler hex notation, 0x3FF0000000000000.
For −1.0, everything stays the same, the sign bit just gets flipped:
1 01111111111 0000000000000000000000000000000000000000000000000000
^ |----^----| |--------------------------^-----------------------|
| | |
| exponent mantissa
|
sign bit
The smallest possible value that can be represented in the REAL8
format has an exponent of 1 and a mantissa of 0 (approximately 2.2×10-308):
0 00000000001 0000000000000000000000000000000000000000000000000000
^ |----^----| |--------------------------^-----------------------|
| | |
| exponent mantissa
|
sign bit
There are some tricky things are the special types of values that can be encoded in this format, like infinities, not-a-number (NaN) and denormals, but you really don't need to worry about those. Basically, in the REAL8
format, the maximum value for the exponent field, 0x7FF, indicates either infinity or NaN, while an exponent field set to 0 and a non-zero mantissa indicates a "denormal" number. You'll find a more thorough description of this, and all foregoing information, in Intel's documentation for the x87 FPU (an older version is mirrored here on John Loomis's site).
But in reality, no one thinks about this or does this conversion by hand. You either get your assembler to do it, or you use a converter (there are plenty of others to choose from online; I just search and use whichever one pops up first).
For example, what if we didn't know about the FLDPI
instruction and wanted to store π as a REAL8
value? Well, we'd plug in 3.14159, and see that its binary REAL8
representation is:
0 10000000000 1001001000011111100111110000000110111000011001101110
^ |----^----| |--------------------------^-----------------------|
| | |
| exponent mantissa
|
sign bit
Or, in hexadecimal: 0x400921F9F01B866E
. Of course, this is an inexact value, closer to 3.141589999…, but floating-point is like that.
As for your question, I think you must be misusing the term "base 10". 1234.5610 is already a base-10 value—no conversion is necessary. If you stored it in REAL8
format, it would be:
0 10000001001 0011010010100011111001110110110010001011010000111001
^ |----^----| |--------------------------^-----------------------|
| | |
| exponent mantissa
|
sign bit
or, equivalently, 0x40934A3E76C8B439
. If you had that and wanted to convert it back into the familiar base-10 notation, you'd have to parse the format as I described it throughout this answer. I honestly don't know why you want to do that, but you could now if you really wanted to. I'd use a converter: plug in 40934A3E76C8B439
, click "Rounded", and get back 1234.5610
.
Otherwise, it sounds like you might be trying to re-invent fixed-point arithmetic, which basically stores real/decimal values as integers. You'd store 1234
as one integer, and 5610
as another integer, then combine them. The decimal point's position would be implicit, and you could add it back in for display purposes. See also: Fixed Point Arithmetic and Tricks (for the x86).
Upvotes: 7