Reputation: 729
I have an 80 element array with the same entries: 176.01977965813853
If I use the mean function I will get the value 176.01977965813842
Why is that?
Here is a minimal working example:
using Statistics
arr = fill(176.01977965813853, 80)
julia> mean(arr)
176.01977965813842
I expected this to return 176.01977965813853
.
Upvotes: 1
Views: 132
Reputation: 3041
The problem as I understand it can be reproduced as follows:
using Statistics
arr = fill(176.01977965813853, 80)
julia> mean(arr)
176.01977965813842
The reason for this is that julia does all floating point arithmetic with 64 bits of precision by default (i.e. the Float64
type). Float64
s cannot represent any real number. There is a finite step between each floating point number and rounding errors are incurred when you do arithmetic on them. These rounding errors are usually fine, but if you're not careful, they can be catastrophic. For instance:
julia> 1e100 + 1.0 - 1e100
0.0
That says that if I do 10^100 + 1 - 10^100
I get zero! If you want to get an upper bound on the errors caused by floating point arithmetic, we can use IntervalArithmetic.jl:
using IntervalArithmetic
julia> 1e100 + interval(1.0) - 1e100
[0, 1.94267e+84]
That says that the operation 1e100 + 1.0 - 1e100
is at least equal to 0.0
and at most 1.94*10^84
, so the error bounds are huge!
We can do the same for the operation you were interested in,
arr = fill(interval(176.01977965813853), 80);
julia> mean(arr)
[176.019, 176.02]
julia> mean(arr).lo
176.019779658138
julia> mean(arr).hi
176.0197796581391
which says that the actual mean could be at least 176.019779658138
or at most 176.0197796581391
, but one can't be any more certain due to floating point error! So here, Float64
gave the answer with at most 10^-13
percent error, which is actually quite small.
What if those are unacceptable error bounds? Use more precision! You can use the big
string macro to get arbitrary precision number literals:
arr = fill(interval(big"176.01977965813853"), 80);
julia> mean(arr).lo
176.0197796581385299999999999999999999999999999999999999999999999999999999999546
julia> mean(arr).hi
176.019779658138530000000000000000000000000000000000000000000000000000000000043
That calculation was done using 256 bits of precision, but you can get even more precision using the setprecision
function:
setprecision(1000)
arr = fill(interval(big"176.01977965813853"), 80);
julia> mean(arr).lo
176.019779658138529999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999599
julia> mean(arr).hi
176.019779658138530000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000579
Note that arbitrary precision arithmetic is sloooow compared to Float64
s, so it's usually best to just use arbitrary precision arithmetic to validate your results to make sure you're converging to a good result within your desired accuracy.
Upvotes: 3
Reputation: 1417
These are just expected floating point errors. But if you need very precise summations, you can use a a bit more elaborate (and costly) summation scheme:
julia> using KahanSummation
[ Info: Precompiling KahanSummation [8e2b3108-d4c1-50be-a7a2-16352aec75c3]
julia> sum_kbn(fill(176.01977965813853, 80))/80
176.01977965813853
Ref: Wikipedia
Upvotes: 4