Reputation: 3451

Same program, different hardware, slightly different numerical results

I have a program that I run in two different pcs using the same version of Julia. I wouldn't like to put a specific snippet of the program because on the one hand is a little complicated and, on the other, I feel there must be something that depends on the machine rather than in my program itself.

So here are some results I get in both machines:

1  1.0000000000582077  15.124999999941792
2  1.9999999999417923  27.536972172616515
3  3.000000000523869  45.722282028989866

1  1.0  15.125
2  2.0  27.53697217302397
3  3.0  45.722282028466

The problem is the rounding errors, I don't know at which point they are introduced that make the results slightly different. It cannot be a formatting error (I use exactly the same program in both cases) and it cannot be a machine epsilon error (the machine epsilon in both cases is >2.220446049250313e-16).

I can only think at the moment that the extra packages that I use, SparseArrays, LinearAlgebra or FFTW is introducing this artificial error, but I'm not sure of it.

EDIT:

Ok so here it goes a minimal example (perhaps the arrays doesn't need to be like [a,a], where a is a sparse array itself). I had to modify my program in order to isolate the problem, so the results I will discuss below are slightly different than those I wrote above. Yet, both show the non-negligible error I mention.

using LinearAlgebra
using SparseArrays
using FFTW

const Nsites = 10000

function Opbuild1(::Val{true})
    Up = spzeros(Complex{Float64}, Nsites, Nsites)
    N = div(Nsites,2)
    ii = 1
    for p in 0:(N-1)
        Up[ii,ii] = exp(-1.0im*p^2*0.25)
        Up[ii + N, ii + N] = exp(-1.0im*(p - N)^2*0.25)
        ii += 1
    end
    return kron(sparse(Matrix(1.0I, 2, 2)), Up)
end

function Opbuild2(::Val{true})
    pop = spzeros(Float64, Nsites, Nsites)
    N = div(Nsites,2)
    ii = 1
    for p in 0:(N-1)
        pop[ii,ii] = p
        pop[ii+N,ii+N] = (p-N)
        ii += 1
    end
    return kron(sparse(Matrix(1.0I, 2, 2)), pop)
end

function initst1(BoolN, irealiz)
    psi = spzeros(Complex{Float64}, Nsites)
    a = range(680,stop=783,step=1)
    x = a[irealiz]
    psi[x] = 1.0
    psi = kron([1.0,1.0im]/sqrt(2), psi)
    return psi, x
end

function ifftcustom(array)
    array = Array(array)
    array1 = copy(array)
    array1[1:Nsites] = ifft(array[1:Nsites])
    array1[Nsites+1:end] = ifft(array[Nsites+1:end])
    return array1/norm(array1)
end

function fftcustom(array)
    array = Array(array)
    array1 = copy(array)
    array1[1:Nsites] = fft(array[1:Nsites])
    array1[Nsites+1:end] = fft(array[Nsites+1:end])
    return array1/norm(array1)
end

function main()
    boolN = rem(Nsites,2) == 0
    Operator = Opbuild1(Val(boolN))
    Operator_p = Opbuild2(Val(boolN))
    #                                                                                                                                                                                                              
    psi0_inp, pinit = initst1(boolN, 1) #!                                                                                                                                                                         
    psi0_inp = round.(ifftcustom(Operator*psi0_inp), digits=15) # (*)See below
    psi0_inp = round.(fftcustom(psi0_inp), digits=15)
    psi0_inp = round.(psi0_inp, digits=15)
    @show psi0_inp'*Operator_p*psi0_inp
end

main()

So the difference is the following. If in (*) I run instead the line psi0_inp = round.(ifftcustom(psi0_inp), digits=15), in the @show part I obtain 679.000000000001 in both machines. On the other hand, if I run it as I wrote it in the code, in one machine I get 679.0000000000001 + 0.0im but in the other 679.0 + 1.2036944776504516e-16im.

I don't care about the -16im because this is a "zero" in double precision, but indeed the real part is of the order of -13 which is not quite a zero in double precision.

Upvotes: 3

Answers (2)

StefanKarpinski

Reputation: 33249

If the machines have different CPU architectures, the results can be different due to different SIMD instructions, specifically different parallel instruction set width. The difference is due to a few things:

Unlike the pure mathematical operations that they approximate, floating-point operations like + and * are non-associative, i.e. (x + y) + z can be different from x + (y + z).
In order to take advantage of SIMD parallel instructions, compilers will re-associate operations as a fairly essential optimization. Julia code requires explicitly opting into this kind of re-association, but high-performance libraries like BLAS (used for dense linear algebra), SuiteSparse (used for sparse linear algebra), and FFTW will definitely enable such optimizations at the cost of perfect reproducibility. If these libraries didn't allow SIMD optimization, they would be 2-8x slower, so they all do this.
The associativity tree of numerical operations when implementing things like summations depends on the width to the SIMD instructions. If the "lane width" is W, then a sum is implemented as W parallel sums, which are added together at the end. Since the width W can be different for different CPU architectures, the associativity tree can be different and thus the numerical results can be different.

Note that none of this is specific to Julia, it would also happen if the program were written in any language using these libraries (BLAS, SuiteSparse and FFTW).

Upvotes: 10

Kristoffer Carlsson

Reputation: 89

You give us quite little to go on but it could be you are running with forced boundschecking on one machine (--check-bounds=yes) which would inhibit certain optimizations like SIMD, which can slightly change the result.

Upvotes: 4

Same program, different hardware, slightly different numerical results

Answers (2)

Related Questions