Reputation: 65
I have to encode the electron charge, which is -1.602*10-19 C, using IEEE-754. I did it manually and verified my result using this site. So I know my representation is good. My problem is that, if I try to build a C program showing my number in scientific notation, I get the wrong number.
Here is my code:
#include <stdio.h>
int main(int argc, char const *argv[])
{
float q = 0xa03d217b;
printf("q = %e", q);
return 0;
}
Here is the result:
$ ./test.exe
q = 2.688361e+09
My question: Is there another representation that my CPU might be using internally for floating point other than IEEE-754?
Upvotes: 3
Views: 4801
Reputation: 30525
Note that C supports hexadecimal-floating point numbers as literals. See https://en.cppreference.com/w/cpp/language/floating_literal for details. This notation is useful to write the number in a portable way, without any concern for rounding issues as would be the case if you write it in regular decimal/scientific notation. Here's the number you're interested in:
#include <stdio.h>
int main(void) {
float f = -0x1.7a42f6p-63;
printf("%e\n", f);
return 0;
};
When I run this program, I get:
$ make a
cc a.c -o a
$ ./a
-1.602000e-19
So long as your compiler supports this notation, you need not worry about how the underlying machine represents floats, so long as this particular number fits into its float
representation.
Upvotes: 0
Reputation: 154169
My problem is that if I try to build a c program showing my the number in scientific notation.
What if your target machine might or might not use IEEE754 encoding? Copying the bit pattern may fail.
If starting with a binary32 constant 0xa03d217b, code could examine it and then build up the best float
available for that implementation.
#include <math.h>
#define BINARY32_MASK_SIGN 0x80000000
#define BINARY32_MASK_EXPO 0x7FE00000
#define BINARY32_MASK_SNCD 0x007FFFFF
#define BINARY32_IMPLIED_BIT 0x800000
#define BINARY32_SHIFT_EXPO 23
float binary32_to_float(uint32_t x) {
// Break up into 3 parts
bool sign = x & BINARY32_MASK_SIGN;
int biased_expo = (x & BINARY32_MASK_EXPO) >> BINARY32_SHIFT_EXPO;
int32_t significand = x & BINARY32_MASK_SNCD;
float y;
if (biased_expo == 0xFF) {
y = significand ? NAN : INFINITY; // For simplicity, NaN payload not copied
} else {
int expo;
if (biased_expo > 0) {
significand |= BINARY32_IMPLIED_BIT;
expo = biased_expo - 127;
} else {
expo = 126;
}
y = ldexpf((float)significand, expo - BINARY32_SHIFT_EXPO);
}
if (sign) {
y = -y;
}
return y;
}
Sample usage and output
#include <float.h>
#include <stdio.h>
int main() {
float e = -1.602e-19;
printf("%.*e\n", FLT_DECIMAL_DIG, e);
uint32_t e_as_binary32 = 0xa03d217b;
printf("%.*e\n", FLT_DECIMAL_DIG, binary32_to_float(e_as_binary32));
}
-1.602000046e-19
-1.602000046e-19
Upvotes: 2
Reputation: 51884
The line float q = 0xa03d217b;
converts the integer (hex) literal into a float
value representing that number (or an approximation thereof); thus, the value assigned to your q
will be the (decimal) value 2,688,360,827
(which is what 0xa03d217b
equates to), as you have noted.
If you must initialize a float
variable with its internal IEEE-754 (HEX) representation, then your best option is to use type punning via the members of a union
(legal in C but not in C++):
#include <stdio.h>
typedef union {
float f;
unsigned int h;
} hexfloat;
int main()
{
hexfloat hf;
hf.h = 0xa03d217b;
float q = hf.f;
printf("%lg\n", q);
return 0;
}
There are also some 'quick tricks' using pointer casting, like:
unsigned iee = 0xa03d217b;
float q = *(float*)(&iee);
But, be aware, there are numerous issues with such approaches, like potential endianness conflicts and the fact that you're breaking strict aliasing requirements.
Upvotes: 12
Reputation: 812
Hence, q
doesn't not contains the value you expect. The hex value is converted to a float with the same value (with approximation), not with the same bit-representation.
When compiled with g++ and the option -Wall, there is a warning:
warning: implicit conversion from 'unsigned int' to 'float' changes value from 2688360827 to 2688360704 [-Wimplicit-const-int-float-conversion]
Can be tested on Compiler Explorer.
This warning is apparently not supported by gcc. Instead, you can use the option -Wfloat-conversion (with is not part of -Wall -Wextra):
warning: conversion from 'unsigned int' to 'float' changes value from '2688360827' to '2.6883607e+9f' [-Wfloat-conversion]
Again on Compiler Explorer.
Upvotes: 1