Reputation: 411
I just finished studying Julia (and most importantly the performance tips!). I realized that the use of global vars makes the code slower. The counter-measure to that was to pass as many variables as possible to arguments of functions. Therefore I did the following test:
x = 10.5 #these are globals
y = 10.5
function bench1() #acts on global
z = 0.0
for i in 1:100
z += x^y
end
return z
end
function bench2(x, y)
z = 0.0
for i in 1:100
z += x^y
end
return z
end
function bench3(x::Float64, y::Float64) #acts on arguments
z::Float64 = 0.0
for i in 1:100
z += x^y
end
return z
end
@time [bench1() for j in 1:100]
@time [bench2(x,y) for j in 1:100]
@time [bench3(x,y) for j in 1:100]
I have to admit that the results were extremely unexpected, and they are not in agreement with what I have read. Results:
0.001623 seconds (20.00 k allocations: 313.375 KB)
0.003628 seconds (2.00 k allocations: 96.371 KB)
0.002633 seconds (252 allocations: 10.469 KB)
The average results are that the first function, which acts on global variables directly is always faster by about a factor of 2, than the last function which has all the proper declarations AND does not act directly on global variables. Can someone explain to me why?
Upvotes: 1
Views: 295
Reputation: 5325
One more problem is that the following are still in global scope:
@time [bench1() for j in 1:100]
@time [bench2(x,y) for j in 1:100]
@time [bench3(x,y) for j in 1:100]
as you can see from the still huge numbers of allocations reported by @time
.
Wrap all these in a function:
function runbench(N)
x = 3.0
y = 4.0
@time [bench1() for j in 1:N]
@time [bench2(x,y) for j in 1:N]
@time [bench3(x,y) for j in 1:N]
end
warm up with runbench(1)
, then for runbench(10^5)
I get
1.425985 seconds (20.00 M allocations: 305.939 MB, 9.93% gc time)
0.061171 seconds (2 allocations: 781.313 KB)
0.062037 seconds (2 allocations: 781.313 KB)
The total memory allocated in cases 2 and 3 is 10^5 times 8 bytes, as expected.
The moral is to almost ignore the actual timings and just look at the memory allocations, which is where the information about type stability is.
EDIT: bench3
is an "anti-pattern" in Julia (i.e. a style of coding that is not used) -- you should never annotate types just with the intention of trying to fix type instabilities; this is not what type annotations are for in Julia.
Upvotes: 8
Reputation: 7385
I guess this is mainly because of the compilation time. If I change the "main" code as
N = 10^2
println("N = $N")
println("bench1")
@time [bench1() for j in 1:N]
@time [bench1() for j in 1:N]
println("bench2")
@time [bench2(x,y) for j in 1:N]
@time [bench2(x,y) for j in 1:N]
it gives
N = 100
bench1
0.004219 seconds (21.46 k allocations: 376.536 KB)
0.001792 seconds (20.30 k allocations: 322.781 KB)
bench2
0.006218 seconds (2.29 k allocations: 105.840 KB)
0.000914 seconds (402 allocations: 11.844 KB)
So in the second measurement, bench1()
is slower than bench2()
by a factor of ~2. (I omitted bench3()
because it gives the same results as bench2()
.) If we increase N
to 10^5, the compilation time becomes negligible compared to the calculation time, so we can see the expected speedup for bench2()
even in the first measurement.
N = 100000
bench1
1.767392 seconds (20.70 M allocations: 321.219 MB, 8.25% gc time)
1.720564 seconds (20.70 M allocations: 321.166 MB, 6.26% gc time)
bench2
0.923315 seconds (799.85 k allocations: 17.608 MB, 0.96% gc time)
0.922132 seconds (797.96 k allocations: 17.517 MB, 1.08% gc time)
Upvotes: 6