Reputation: 1163
I write below code:
int vat = (int)(invoice.total * 0.08f);
assume invoice.total = 36000. then vat
must be 2880 but is 2879!
I changed my code to
float v = invoice.total * 0.08f;
int vat = (int)v;
Now vat
has correct value (2880).
I wonder if ()
has more priority or not! and also float is exact 2880.0 not a little less, so no rounding must happen!
Upvotes: 2
Views: 3060
Reputation: 61952
A float
holds some "hidden" precision that is not shown. Try watching invoice.total.ToString("R")
, and you will probably see that it is not exactly 36000
.
Alternatively, this can be a result of your runtime choosing a "broader" storage location, like a 64-bit or 80-bit CPU register or similar, for the intermediate result invoice.total * 0.08f
.
EDIT: You can throw away the effects arising from the runtime choosing a too wide storage location, by changing
(int)(invoice.total * 0.08f)
into
(int)(float)(invoice.total * 0.08f)
The extra cast, from float
to float
(sic!), looks like a no-op, but it does force the runtime to round and throw away that unwanted precision. This is poorly documented. [Will provide reference.] A related thread you might want to read: Are floating-point numbers consistent in C#? Can they be?
Your example is actually archetypical, so I have decided to go a bit more into detail. This stuff is well described in the section Differences Among IEEE 754 Implementations which is written as an addendum (by an anonymous author) to David Goldberg's What Every Computer Scientist Should Know About Floating-Point Arithmetic. So suppose we have this code:
static int SO_24548957_I()
{
float t = 36000f; // exactly representable
float r = 0.08f; // this is not representable, rounded
float temporary = t * r;
int v = (int)temporary;
return v; // always(?) 2880
}
Everything seems fine, but we decide to refactor the temporary variable away, so we write:
static int SO_24548957_II()
{
float t = 36000f; // exactly representable
float r = 0.08f; // this is not representable, rounded
int v = (int)(t * r);
return v; // could be 2880 or 2879 depending on strange things
}
and Bang! the behavior of our program changes. You can see the change on most systems (at least on mine!) if you compile for platform x86
(or Any CPU
with Prefer 32-bit
selected). Optimizations or not (Release or Debug mode) could be relevant in theory, and the hardware architecture is certainly important too.
It is a complete surprise to many that both 2880 and 2879 can be correct answers on IEEE-754-compliant systems, but read the link I gave.
To elaborate over what is meant by "not representable", let us see what the C# compiler must do when it encounters the symbol 0.08f
. Beacuse of the way float
(32-bit binary floating point) works, we will have to choose between:
10737418 / 2**27 == 0.079 999 998 2...
and
10737419 / 2**27 == 0.080 000 005 6...
where **
means exponentiation (i.e. "to the power of"). Since the first one is nearer to the desired mathematical value, we must choose that one. So the actual value is a bit smaller than the desired one. Now when we do the multiplication and want to store in a Single
again we must also, as a part of the multiplication algorithm, round again to yield the product representation which is closest to the exact "mathematical" product of the (actual) factors 36000
and 0.0799999982...
. In this case you are lucky that the nearest Single
is actually 2880
exactly, so the multiplication process in our case involves a round-up to this value.
Therefore the first code example above gives 2880
.
However, in the second code example above, the multiplication might be done (at the choice of the run-time, we cannot really help that) in some CPU hardware that handles many bits (64 or 80, typically). In that case, the product of any two 32-bit floats, like ours, can be calculated without need for rounding the end result, because 64 bits or 80 bits are more than enough to hold the full product of two 32-bit floats. So clearly this product is smaller than 2880
since
0.0799999982...
is less than 0.08
.
Therefore the second method example above could return 2879
.
For comparison, this code:
static int SO_24548957_III()
{
float t = 36000f; // exactly representable
float r = 0.08f; // this is not representable, rounded
double temporary = t * (double)r;
int v = (int)temporary;
return v; // always(?) 2879
}
always give the 2879
because we explicitly tell the compile to convert the Single
to Double
which means adding a bunch of binary zeroes, so we get to the 2879
case with certainty.
Lessons learned: (1) With binary floating points, fatoring out a sub-expression to a temp variable might change the result. (2) With binary floating points, C# compiler settings like x86
vs. x64
might change the result.
Of course, as everybody says everywhere, do not use float
or double
for monetary applications; use decimal
there.
Upvotes: 2
Reputation:
Just an addendum to Jeppe's and David's answer regarding the compiler choosing a different precision of an intermediate value.
Your first expression, written in a function like:
static int Calc1(int value)
{
float v = value * 0.08f;
return (int) v;
}
will result in the following IL code:
.method private hidebysig static int32 Calc1(int32 'value') cil managed
{
// Code size 12 (0xc)
.maxstack 2
.locals init ([0] float32 v)
IL_0000: ldarg.0
IL_0001: conv.r4
IL_0002: ldc.r4 7.9999998e-002
IL_0007: mul
IL_0008: stloc.0
IL_0009: ldloc.0
IL_000a: conv.i4
IL_000b: ret
} // end of method Program::Calc1
Note, that the instructions stloc.0
and ldloc.0
convert the multiplication result to a float before the final conversation to an int (conv.i4
) takes place.
Now let's look at your second expression:
static int Calc2(int value)
{
return (int)(value * 0.08f);
}
and the according IL code:
.method private hidebysig static int32 Calc2(int32 'value') cil managed
{
// Code size 10 (0xa)
.maxstack 8
IL_0000: ldarg.0
IL_0001: conv.r4
IL_0002: ldc.r4 7.9999998e-002
IL_0007: mul
IL_0008: conv.i4
IL_0009: ret
} // end of method Program::Calc2
Note that the result of the multiplication is directly converted to an int.
The multiplication result has the precision as provided by the floating point CPU instructions chosen by the JIT compiler, which most likely will exceed the precision of the float format. Thus, the first code incurs an additional loss of precision due to the float conversion of the multiplication result. The second code does not suffer from this additional precision loss, as it avoids the intermediate float conversion.
(Actually, for the first code example the JIT compiler might be smart enough to instruct the CPU to do floating point arithmetic with single precision only, thus already doing the multiplication with the low single precision.)
You might want to argue that the stloc.0
ldloc.0
combo in the IL cod of the first example is pointless and should be optimized away if the compiler would just be smart enough. Alas, this is not the case. Look again at the C# code of the first example. There, the source code explicitly demands that the multiplication result must be converted into a float value (via the variable v). The stloc.0
ldloc.0
combo is merely the way the compiler did choose to adhere to this demanded float conversion.
Upvotes: 0
Reputation: 612934
0.08f is not exactly representable. The closest single precision value is
0.07999999821186065673828125
So you actually calculate
36000 * 0.07999999821186065673828125
which is just a little less than 2880
. You then truncate the value, and hence receive the value 2879
.
This might be the first time you have encountered an issue like this, but I bet you were not expecting that the actual value of 0.08f
would be 0.07999999821186065673828125
.
Consider this variant:
float f = 36000 * 0.08f;
Console.WriteLine((int)f);
double d1 = 36000 * 0.08f;
Console.WriteLine((int)d1);
double d2 = 36000 * 0.08d;
Console.WriteLine((int)d2);
which outputs
2880 2879 2880
Why do your two variants behave differently? Because the compiler is choosing to store an intermediate value for invoice.total * 0.08f
to a precision other than single.
Clearly you are playing with fire here. This behaviour is all down to fundamental property of floating point arithmetic. Your choice of binary floating point inevitable leads to issues like this. One way to get around this is to round the values to the nearest integer.
float f = 36000 * 0.08f;
Console.WriteLine((int)Math.Round(f));
double d1 = 36000 * 0.08f;
Console.WriteLine((int)Math.Round(d1));
double d2 = 36000 * 0.08d;
Console.WriteLine((int)Math.Round(d2));
which results in
2880 2879 2880
You might also consider using Decimal
for calculations like this. That way you operate on decimal rather than binary representations and so will be able to represent all these values exactly.
int vat = (int)(36000 * 0.08m);
Console.WriteLine(vat);
which outputs
2880
Exactly how to solve the problem depends very much on the details of the calculation and your business logic. But the fundamental issue is that binary floating point cannot represent your calculations exactly.
Upvotes: 1