Reputation: 3451
I have a program that I run in two different pcs using the same version of Julia. I wouldn't like to put a specific snippet of the program because on the one hand is a little complicated and, on the other, I feel there must be something that depends on the machine rather than in my program itself.
So here are some results I get in both machines:
1 1.0000000000582077 15.124999999941792
2 1.9999999999417923 27.536972172616515
3 3.000000000523869 45.722282028989866
1 1.0 15.125
2 2.0 27.53697217302397
3 3.0 45.722282028466
The problem is the rounding errors, I don't know at which point they are introduced that make the results slightly different. It cannot be a formatting error (I use exactly the same program in both cases) and it cannot be a machine epsilon error (the machine epsilon in both cases is >2.220446049250313e-16
).
I can only think at the moment that the extra packages that I use, SparseArrays
, LinearAlgebra
or FFTW
is introducing this artificial error, but I'm not sure of it.
EDIT:
Ok so here it goes a minimal example (perhaps the arrays doesn't need to be like [a,a], where a is a sparse array itself). I had to modify my program in order to isolate the problem, so the results I will discuss below are slightly different than those I wrote above. Yet, both show the non-negligible error I mention.
using LinearAlgebra
using SparseArrays
using FFTW
const Nsites = 10000
function Opbuild1(::Val{true})
Up = spzeros(Complex{Float64}, Nsites, Nsites)
N = div(Nsites,2)
ii = 1
for p in 0:(N-1)
Up[ii,ii] = exp(-1.0im*p^2*0.25)
Up[ii + N, ii + N] = exp(-1.0im*(p - N)^2*0.25)
ii += 1
end
return kron(sparse(Matrix(1.0I, 2, 2)), Up)
end
function Opbuild2(::Val{true})
pop = spzeros(Float64, Nsites, Nsites)
N = div(Nsites,2)
ii = 1
for p in 0:(N-1)
pop[ii,ii] = p
pop[ii+N,ii+N] = (p-N)
ii += 1
end
return kron(sparse(Matrix(1.0I, 2, 2)), pop)
end
function initst1(BoolN, irealiz)
psi = spzeros(Complex{Float64}, Nsites)
a = range(680,stop=783,step=1)
x = a[irealiz]
psi[x] = 1.0
psi = kron([1.0,1.0im]/sqrt(2), psi)
return psi, x
end
function ifftcustom(array)
array = Array(array)
array1 = copy(array)
array1[1:Nsites] = ifft(array[1:Nsites])
array1[Nsites+1:end] = ifft(array[Nsites+1:end])
return array1/norm(array1)
end
function fftcustom(array)
array = Array(array)
array1 = copy(array)
array1[1:Nsites] = fft(array[1:Nsites])
array1[Nsites+1:end] = fft(array[Nsites+1:end])
return array1/norm(array1)
end
function main()
boolN = rem(Nsites,2) == 0
Operator = Opbuild1(Val(boolN))
Operator_p = Opbuild2(Val(boolN))
#
psi0_inp, pinit = initst1(boolN, 1) #!
psi0_inp = round.(ifftcustom(Operator*psi0_inp), digits=15) # (*)See below
psi0_inp = round.(fftcustom(psi0_inp), digits=15)
psi0_inp = round.(psi0_inp, digits=15)
@show psi0_inp'*Operator_p*psi0_inp
end
main()
So the difference is the following. If in (*)
I run instead the line
psi0_inp = round.(ifftcustom(psi0_inp), digits=15)
, in the @show
part I obtain 679.000000000001
in both machines. On the other hand, if I run it as I wrote it in the code, in one machine I get 679.0000000000001 + 0.0im
but in the other 679.0 + 1.2036944776504516e-16im
.
I don't care about the -16im because this is a "zero" in double precision, but indeed the real part is of the order of -13 which is not quite a zero in double precision.
Upvotes: 3
Views: 305
Reputation: 33249
If the machines have different CPU architectures, the results can be different due to different SIMD instructions, specifically different parallel instruction set width. The difference is due to a few things:
Unlike the pure mathematical operations that they approximate, floating-point operations like +
and *
are non-associative, i.e. (x + y) + z
can be different from x + (y + z)
.
In order to take advantage of SIMD parallel instructions, compilers will re-associate operations as a fairly essential optimization. Julia code requires explicitly opting into this kind of re-association, but high-performance libraries like BLAS (used for dense linear algebra), SuiteSparse (used for sparse linear algebra), and FFTW will definitely enable such optimizations at the cost of perfect reproducibility. If these libraries didn't allow SIMD optimization, they would be 2-8x slower, so they all do this.
The associativity tree of numerical operations when implementing things like summations depends on the width to the SIMD instructions. If the "lane width" is W, then a sum is implemented as W parallel sums, which are added together at the end. Since the width W can be different for different CPU architectures, the associativity tree can be different and thus the numerical results can be different.
Note that none of this is specific to Julia, it would also happen if the program were written in any language using these libraries (BLAS, SuiteSparse and FFTW).
Upvotes: 10
Reputation: 89
You give us quite little to go on but it could be you are running with forced boundschecking on one machine (--check-bounds=yes
) which would inhibit certain optimizations like SIMD, which can slightly change the result.
Upvotes: 4