clearseplex
clearseplex

Reputation: 729

Mean function incorrect value

I have an 80 element array with the same entries: 176.01977965813853

If I use the mean function I will get the value 176.01977965813842

Why is that?


Here is a minimal working example:

using Statistics
arr = fill(176.01977965813853, 80)

julia> mean(arr)
176.01977965813842

I expected this to return 176.01977965813853.

Upvotes: 1

Views: 132

Answers (2)

Mason
Mason

Reputation: 3041

The problem as I understand it can be reproduced as follows:

using Statistics
arr = fill(176.01977965813853, 80)

julia> mean(arr)
176.01977965813842

The reason for this is that julia does all floating point arithmetic with 64 bits of precision by default (i.e. the Float64 type). Float64s cannot represent any real number. There is a finite step between each floating point number and rounding errors are incurred when you do arithmetic on them. These rounding errors are usually fine, but if you're not careful, they can be catastrophic. For instance:

julia> 1e100 + 1.0 - 1e100
0.0

That says that if I do 10^100 + 1 - 10^100 I get zero! If you want to get an upper bound on the errors caused by floating point arithmetic, we can use IntervalArithmetic.jl:

using IntervalArithmetic

julia> 1e100 + interval(1.0) - 1e100
[0, 1.94267e+84]

That says that the operation 1e100 + 1.0 - 1e100 is at least equal to 0.0 and at most 1.94*10^84, so the error bounds are huge!

We can do the same for the operation you were interested in,

arr = fill(interval(176.01977965813853), 80);

julia> mean(arr)
[176.019, 176.02]

julia> mean(arr).lo
176.019779658138

julia> mean(arr).hi
176.0197796581391

which says that the actual mean could be at least 176.019779658138 or at most 176.0197796581391, but one can't be any more certain due to floating point error! So here, Float64 gave the answer with at most 10^-13 percent error, which is actually quite small.

What if those are unacceptable error bounds? Use more precision! You can use the big string macro to get arbitrary precision number literals:

arr = fill(interval(big"176.01977965813853"), 80);

julia> mean(arr).lo
176.0197796581385299999999999999999999999999999999999999999999999999999999999546

julia> mean(arr).hi
176.019779658138530000000000000000000000000000000000000000000000000000000000043

That calculation was done using 256 bits of precision, but you can get even more precision using the setprecision function:

setprecision(1000)
arr = fill(interval(big"176.01977965813853"), 80);

julia> mean(arr).lo
176.019779658138529999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999599

julia> mean(arr).hi
176.019779658138530000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000579

Note that arbitrary precision arithmetic is sloooow compared to Float64s, so it's usually best to just use arbitrary precision arithmetic to validate your results to make sure you're converging to a good result within your desired accuracy.

Upvotes: 3

Moritz Schauer
Moritz Schauer

Reputation: 1417

These are just expected floating point errors. But if you need very precise summations, you can use a a bit more elaborate (and costly) summation scheme:

julia> using KahanSummation
[ Info: Precompiling KahanSummation [8e2b3108-d4c1-50be-a7a2-16352aec75c3]

julia> sum_kbn(fill(176.01977965813853, 80))/80
176.01977965813853

Ref: Wikipedia

Upvotes: 4

Related Questions