George Datseris
George Datseris

Reputation: 411

Acting on globals faster than passing them as arguments? [julia-lang]

I just finished studying Julia (and most importantly the performance tips!). I realized that the use of global vars makes the code slower. The counter-measure to that was to pass as many variables as possible to arguments of functions. Therefore I did the following test:

x = 10.5  #these are globals
y = 10.5

function bench1()  #acts on global
  z = 0.0
  for i in 1:100
    z += x^y
  end
  return z
end

function bench2(x, y)
  z = 0.0
  for i in 1:100
    z += x^y
  end
  return z
end

function bench3(x::Float64, y::Float64) #acts on arguments
  z::Float64 = 0.0
  for i in 1:100
    z += x^y
  end
  return z
end

@time [bench1() for j in 1:100]
@time [bench2(x,y) for j in 1:100]
@time [bench3(x,y) for j in 1:100]

I have to admit that the results were extremely unexpected, and they are not in agreement with what I have read. Results:

0.001623 seconds (20.00 k allocations: 313.375 KB)
0.003628 seconds (2.00 k allocations: 96.371 KB)
0.002633 seconds (252 allocations: 10.469 KB)

The average results are that the first function, which acts on global variables directly is always faster by about a factor of 2, than the last function which has all the proper declarations AND does not act directly on global variables. Can someone explain to me why?

Upvotes: 1

Views: 295

Answers (2)

David P. Sanders
David P. Sanders

Reputation: 5325

One more problem is that the following are still in global scope:

@time [bench1() for j in 1:100]
@time [bench2(x,y) for j in 1:100]
@time [bench3(x,y) for j in 1:100]

as you can see from the still huge numbers of allocations reported by @time.

Wrap all these in a function:

function runbench(N)
    x = 3.0
    y = 4.0
    @time [bench1() for j in 1:N]
    @time [bench2(x,y) for j in 1:N]
    @time [bench3(x,y) for j in 1:N]
end

warm up with runbench(1), then for runbench(10^5) I get

1.425985 seconds (20.00 M allocations: 305.939 MB, 9.93% gc time)
0.061171 seconds (2 allocations: 781.313 KB)
0.062037 seconds (2 allocations: 781.313 KB)

The total memory allocated in cases 2 and 3 is 10^5 times 8 bytes, as expected.

The moral is to almost ignore the actual timings and just look at the memory allocations, which is where the information about type stability is.

EDIT: bench3 is an "anti-pattern" in Julia (i.e. a style of coding that is not used) -- you should never annotate types just with the intention of trying to fix type instabilities; this is not what type annotations are for in Julia.

Upvotes: 8

roygvib
roygvib

Reputation: 7385

I guess this is mainly because of the compilation time. If I change the "main" code as

N = 10^2
println("N = $N") 

println("bench1")
@time [bench1() for j in 1:N]
@time [bench1() for j in 1:N]

println("bench2")
@time [bench2(x,y) for j in 1:N]
@time [bench2(x,y) for j in 1:N]

it gives

N = 100
bench1
  0.004219 seconds (21.46 k allocations: 376.536 KB)
  0.001792 seconds (20.30 k allocations: 322.781 KB)
bench2
  0.006218 seconds (2.29 k allocations: 105.840 KB)
  0.000914 seconds (402 allocations: 11.844 KB)

So in the second measurement, bench1() is slower than bench2() by a factor of ~2. (I omitted bench3() because it gives the same results as bench2().) If we increase N to 10^5, the compilation time becomes negligible compared to the calculation time, so we can see the expected speedup for bench2() even in the first measurement.

N = 100000
bench1
  1.767392 seconds (20.70 M allocations: 321.219 MB, 8.25% gc time)
  1.720564 seconds (20.70 M allocations: 321.166 MB, 6.26% gc time)
bench2
  0.923315 seconds (799.85 k allocations: 17.608 MB, 0.96% gc time)
  0.922132 seconds (797.96 k allocations: 17.517 MB, 1.08% gc time)

Upvotes: 6

Related Questions