Reputation: 19
i wrote the following test code to check fixed point arithmetic and bit shifting.
void main(){
float x = 2;
float y = 3;
float z = 1;
unsigned int * px = (unsigned int *) (& x);
unsigned int * py = (unsigned int *) (& y);
unsigned int * pz = (unsigned int *) (& z);
*px <<= 1;
*py <<= 1;
*pz <<= 1;
*pz =*px + *py;
*px >>= 1;
*py >>= 1;
*pz >>= 1;
printf("%f %f %f\n",x,y,z);
}
The result is 2.000000 3.000000 0.000000
Why is the last number 0? I was expecting to see a 5.000000 I want to use some kind of fixed point arithmetic to bypass the use of floating point numbers on an image processing application. Which is the best/easiest/most efficient way to turn my floating point arrays into integers? Is the above "tricking the compiler" a robust workaround? Any suggestions?
Upvotes: 1
Views: 7409
Reputation: 86651
It's probable that your compiler uses IEEE 754 format for float
s, which in bit terms, looks like this:
SEEEEEEEEFFFFFFFFFFFFFFFFFFFFFFF
^ bit 31 ^ bit 0
S
is the sign bit s = 1 implies the number is negative.
E
bits are the exponent. There are 8 exponent bits giving a range of 0 - 255 but the exponent is biased - you need to subtract 127 to get the true exponent.
F
bits are the fraction part, however, you need to imagine an invisible 1 on the front so the fraction is always 1.something and all you see are the binary fraction digits.
The number 2 is 1 x 21 = 1 x 2128 - 127 so is encoded as
01000000000000000000000000000000
So if you use a bit shift to shift it right you get
10000000000000000000000000000000
which by convention is -0 in IEEE754, so rather than multiplying your number by 2 your shift has made it zero.
The number 3 is [1 + 0.5] x 2128 - 127
which is represented as
01000000010000000000000000000000
Shifting that left gives you
10000000100000000000000000000000
which is -1 x 2-126 or some very small number.
You can do the same for z, but you probably get the idea that shifting just screws up floating point numbers.
Upvotes: 2
Reputation: 91017
What you are doing are cruelties to the numbers.
First, you assign values to float variables. How they are stored is system dependant, but normally, IEEE 754 format is used. So your variables internally look like
x = 2.0 = 1 * 2^1 : sign = 0, mantissa = 1, exponent = 1 -> 0 10000000 00000000000000000000000 = 0x40000000
y = 3.0 = 1.5 * 2^1 : sign = 0, mantissa = 1.5, exponent = 1 -> 0 10000000 10000000000000000000000 = 0x40400000
z = 1.0 = 1 * 2^0 : sign = 0, mantissa = 1, exponent = 0 -> 0 01111111 00000000000000000000000 = 0x3F800000
If you do some bit shiftng operations on these numbers, you mix up the borders between sign, exponent and mantissa and so anything can, may and will happen.
In your case:
The latter you lose by adding -0.0 and -1.1754943508222875e-38, which becomes the latter, namely 0x80800000, which should be, after >>ing it by 1, 3.0 again. I don't know why it isn't, probably because I made a mistake here.
What stays is that you cannot do bit-shifting on floats an expect a reliable result.
I would consider converting them to integer or other fixed-point on the ARM and sending them over the line as they are.
Upvotes: 2
Reputation: 3970
Fixed point doesn't work that way. What you want to do is something like this:
void main(){
// initing 8bit fixed point numbers
unsigned int x = 2 << 8;
unsigned int y = 3 << 8;
unsigned int z = 1 << 8;
// adding two numbers
unsigned int a = x + y;
// multiplying two numbers with fixed point adjustment
unsigned int b = (x * y) >> 8;
// use numbers
printf("%d %d\n", a >> 8, b >> 8);
}
Upvotes: 1
Reputation: 94185
If you want to use fixed point, dont use type 'float
' or 'double
' because them has internal structure. Floats and Doubles have specific bit for sign; some bits for exponent, some for mantissa (take a look on color image here); so they inherently are floating point.
You should either program fixed point by hand storing data in integer type, or use some fixed-point library (or language extension).
There is a description of Floating point extensions implemented in GCC: http://gcc.gnu.org/onlinedocs/gcc/Fixed_002dPoint.html
There is some MACRO-based manual implementation of fixed-point for C: http://www.eetimes.com/discussion/other/4024639/Fixed-point-math-in-C
Upvotes: 3