Reputation: 1307
I know in C and Java, float's underlying representation is IEEE754-32, double is IEEE754-64.
In expressions, float
will be auto-promoted to double
. So how?
Take 3.7f for example. Is the process like this?
- 3.7f will be represented in memory using IEEE754. It fits in 4 bytes.
- During calculation, it may be loaded into a 64-bit register (or whatever 64-bit place), turning the 3.7f into IEEE754-64 represent.
Upvotes: 4
Views: 1082
Reputation: 41764
I know in C and Java, float's underlying representation is IEEE754-32, double is IEEE754-64.
Wrong! The C standard has never specified a fixed format and/or specific limit in integer and floating-point types' sizes although they did ensure the relation between types:
1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)
sizeof(float) <= sizeof(double) <= sizeof(long double)
C implementations are allowed to use any floating-point formats although most nowadays use IEEE-754. Likewise they can freely use any of the 3 allowed integer representations including 1's complement or sign-magnitude
How is float variable auto-promoted to double type?
It's never promoted. Pre-standard versions of C promote floats in expressions to double but in C89/90 the rule was changed and float * float
results in a float
result:
If either operand has type long double, the other operand is converted to long double Otherwise, if either operand is double, the other operand is converted to double. Otherwise, if either operand is float, the other operand is converted to float.
It would be true in Java or C# though, since they run bytecode in a virtual machine, and the VM's types are consistent across platforms
Upvotes: 1
Reputation: 349
Excellent answer by phuclv. I'd like to add a few things.
In general, C evaluates expressions left to right. If you cast an operand to a double, that will force succeeding portions of the expression to be first promoted to at least double prior to evaluation.
So for instance:
float a,b,c,d;
// In the following, a is first cast to a double, forcing b to be promoted to a double
// This is true for all operations (*/+-%)
c=(double)a * b;
// However, in the following the order of operations can produce surprising results
d=(double)a + b * c;
// There are implied parentheses around the multiplication, so an equivalent expression is
d=(double)a + (b * c);
// b and c are not promoted until after they are combined
// You may need to cast twice to get your desired results
d=(double)a + ((double)b * c);
And to add some details to the definitions of float, double and long double: in all implementations that I know of, float is indeed 32 bits, double is 64 bits, and they are IEEE754 implementations. 'long float' however is not only implementation dependent, but might also be hardware dependent.
Under Visual Studio / MSC, a long double might be either 80 or 64 bits. If the x87 math coprocess is being used then 80 bits is used to store the x87 internal register value. If SSE registers are being used, Microsoft took a shortcut and made 'double' and 'long double' identical for all practical purposes. Note where phuclv says 'sizeof(float) <= sizeof(double)' this is a place where they are the same size and it is completely legal. And to further confuse things, under Windows the x87 is configured in such a way that only 64 bits are used rather than the fill 80 bits available, so while the long double might specify 80 bits, 16 of them will usually be meaningless.
Using the GNU compiler it is possible to specify a 128 bit float, but even there 'long double' does not mean > 64 bits. It is a special variable type (__float128).
Upvotes: 1
Reputation: 320421
It is very implementation-dependent.
For one example, on x86 platform the set of FPU commands includes commands for loading/storing data in IEEE754 float
and double
formats (as well as many other formats). The data is loaded into the internal FPU registers that have 80-bit width. So in reality on x86 all floating-point calculations are performed with 80-bit floating-point precision. i.e. all floating-point data is actually promoted to 80-bit precision. How is data represented inside those registers is completely irrelevant, since you cannot observe them directly anyway.
This means that on x86 platform there's no such thing as a single-step float-to-double conversion. Whenever a need for such conversion arises, it is actually implemented as two-step conversion: float-to-internal-fpu and internal-fpu-to-double.
This BTW created a significant semantic difference between x86 FPU computation model and C/C++ computation models. In order to fully match the language model the processor has to forcefully reduce precision of intermediate floating-point results, thus negatively affecting performance. Many compilers provide user with options that control FPU computation model, allowing the user to opt for strict C/C++ conformance, better performance or something in between.
Not so many years ago FPU unit was an optional component of x86 platform. Floating-point computations on FPU-less platforms were performed in software, either by emulating FPU or by generating code without any FPU instructions at all. In such implementations things could work differently, like, for example, perform software conversion from IEEE754 float
to IEEE754 double
directly.
Upvotes: 6