Reputation: 1895
I'm trying to optimize the speed and allocation of this loop
function loop(n,k)
m = rand(10,n)
r = rand(10,1)
for j in 1:k
for i in 2:size(m,2)
@inbounds m[:,i-1] = m[:,i] + rand!(r)
end
end
end
The memory allocation is quite big: @time loop(10000,30)
has an allocation of 599.94k, increasing in k
. I think there are two contributing factors: (1) allocation of m[:,i]
and (2) allocation of m[:,i-1]
. I'm hoping that @inbounds
can help, but it doesn't. Removing @inbounds
doesn't help with the allocation.
Is there a way to reduce allocations? I'm really not creating any new objects, so it should be invariant to k
. I tried to replace @inbounds
with @view
but it didn't even run. I don't think I can use broadcast!
here.
Upvotes: 2
Views: 43
Reputation: 13800
Prezemyslaw points out the main issues, but I see a further benefit from combining his ideas with your original idea of preallocating r
and then using in place rand!
:
julia> using Random
julia> function loop3(n,k)
m = rand(10,n)
r = rand(10)
for j in 1:k
for i in 2:size(m,2)
@inbounds m[:,i-1] .= @view(m[:,i]) .+ rand!(r)
end
end
end
loop3 (generic function with 1 method
Which gives:
julia> @btime loop(1_000, 1_000)
266.851 ms (3485003 allocations: 327.64 MiB)
julia> @btime loop2(1_000, 1_000)
101.003 ms (999002 allocations: 152.51 MiB)
julia> @btime loop3(1_000, 1_000)
61.447 ms (3 allocations: 78.36 KiB)
which is basically now just the allocation of m
and r
:
julia> @btime begin
rand(10, 1_000); rand(10)
end
15.881 μs (3 allocations: 78.36 KiB)
Upvotes: 1
Reputation: 42264
Use views as they do not materialize, use broadcasting.
function loop2(n,k)
m = rand(10,n)
for j in 1:k
for i in 2:size(m,2)
@inbounds m[:,i-1] .= @view(m[:,i]) .+ rand(10)
end
end
end
The new version is 3x faster and uses 3.5x less allocations and 40% of memory:
julia> @btime loop(1000,1000)
227.918 ms (3485003 allocations: 327.64 MiB)
julia> @btime loop2(1000,1000)
75.154 ms (999002 allocations: 152.51 MiB)
Upvotes: 2