Reputation: 7913
I have a question which deals with the difference in representation between a float and a byte in a computer system. So for instance how does a float convert from a float to and int and back; it results in a completely different representation of the bytes. Are there any resources I could use to point me in the right direction? Any help is much appreciated! I cannot find anything online, a link in the right direction would be very helpful!
Upvotes: 3
Views: 3360
Reputation: 76695
In the general case, it is dangerous to think of floating point values as being "exact". Once you have a fraction, a float will probably be approximate. (If the fraction can be exactly expressed in binary, it might still be exact, like 1/2
. But for example, 1/10
will be an approximate value.)
Several people have pointed you to Wikipedia articles. Read those. Basically, a float value is a sign bit, an exponent, and a mantissa. If you have a float value of 50.0
then the float value will look nothing like the integer representation of 50
. (Original incorrect discussion deleted; see comments for details.)
There is no simple way to transform an integer to a float or vice versa. The CPU of your computer has built-in instructions to do the transformation for you, or you can write a program that will do it, but there isn't a simple trick.
EDIT: Above was edited to remove a part where I wrote some incorrect stuff. @Eric Postpischil pointed out the major mistake I made: a value of 50 would be stored as a fractional value raised to a power greater than one, not what I originally said. That was a dumb mistake, and I apologize. He also pointed out that the "mantissa" part is technically the "significand"; I will simply say that I have often seen it called the "mantissa", whether that is correct or not.
I'll repeat the important part: there is no simple way to transform an integer to a float or vice versa.
Upvotes: 0
Reputation: 320371
The representations of float
and int
on a specific platforms are strictly defined and known to the C compiler on that platform. Which means that there is always a well-defined algorithm for converting one to another. In practice, when the natural platform-specific types are used, the conversion is performed internally by the CPU (FPU). The CPU has a dedicated command that that reads the float
data from memory into the internal CPU registers. Another command can then write that data back into memory as an int
value. And vice versa.
For example, on x86 platform a float_value = int_value
assigment will be generally translated into a sequence of CPU commands like
fld int_value ; read `int` value from `int_value` to the internal register
fst float_value ; save `float` value from the internal register to `float_value`
When it comes to converting arithmetic data types that are not immediately supported by the hardware, C compiler has to spell out all necessary conversion steps literally in the generated code. Sometimes one might need support for integer or floating-point types that are not even supported by the language, in which case one has to implement the conversion manually.
Upvotes: 0
Reputation: 222302
The most common encoding of floating-point numbers uses IEEE 754. For single-precision numbers, there is a sign bit (s), 8 exponent bits (e), and 23 fraction bits (f).
For most values of s, e, and f, the value represented is -1s•2e-127•F, where F is the number you get by writing “1.” followed by the 23 bits of f and then interpreting that string as a binary numeral. E.g., if f is 10000000000000000000000, then the binary numeral is 1.10000000000000000000000, which is (in decimal) 1.5, so F is 1.5.
The above holds whenever 0 < e < 255. The values 0 and 255 are special.
When e is 0, the value represented is the same as above except that you start F with “0.” instead of “1.”. In particular, if f is zero, then the value represented is zero. If f is not zero, these are called denormal numbers, because they are smaller than the normal values represented in the primary way above.
When e is 255 and f is 0, the value represented is +infinity or -infinity, according to the sign bit, s. When e is 255 and f is not zero, the value represented is called a NaN, Not a Number, which is used for debugging or catching errors or other special purposes. There are quiet NaNs (which do not cause traps; they are typically used when you want to continue calculations to get a final result, then figure out what to do about a NaN) and signaling NaNs (which do cause traps; they are typically used when you want to abort a calculation because an error has occurred).
There may be variations in how the encoding appears on different platforms, especially the ordering of bytes within the 32 bits. And some platforms do not use IEEE 754 encodings.
Double-precision encoding uses the same scheme, except e is 11 bits, the 127 (called the exponent bias) is changed to 1023, and f is 52 bits. Also the special value for the exponent is its 11-bit maximum, 2047, rather than the 8-bit maximum, 255.
Upvotes: 4
Reputation: 45057
The C standard does not specify how floating-point values are represented. Each implementation/platform is free to implement floating-point numbers however they see fit. There are certain constraints that the implementations must adhere to, but the bit-level representation is implementation defined.
Since you can't predict how a floating-point number will be stored or represented in general, there is no real way to answer this question as written. If you have a particular floating-point implementation in mind, we may be able to provide some details. Those details, however, would only be relevant to that specific implementation. Also note that different CPU architectures may convert values differently, even if they use the same floating-point representation.
Upvotes: 0