Reputation: 479
I want to multiply several matrices (all of same dimensions) to a vector beta. I have tried two versions, to store the matrices, either as a vector of matrices, or as a tri-dimensions array.
Is it normal that the version with the vector of matrices runs faster?
using BenchmarkTools
nid =10000
npar = 5
x1 = reshape(repeat([1], nid * npar), nid,:)
x2 = reshape(repeat([2], nid * npar), nid,:)
x3 = reshape(repeat([3], nid * npar), nid,:)
X = reshape([x1 x2 x3], nid, npar, :);
X1 = [x1, x2, x3]
beta = rand(npar)
function f(X::Array{Int,3}, beta::Vector{Float64})::Array{Float64}
hcat([ X[:,:,i] * beta for i=1:size(X,3)]...)
end
function g(X::Array{Array{Int,2},1}, beta::Vector{Float64})::Array{Float64}
hcat([X[i] * beta for i=1:size(X)[1]]...)
end
f(X,beta);
g(X1,beta);
@benchmark f(X, beta)
@benchmark g(X1, beta)
Results indicate that f takes almost 2x time of g. Is it a normal pattern, or I am not using the 3D Array properly?
Upvotes: 1
Views: 744
Reputation: 6423
That's because the slice operator forces each matrix to be copied hence memory allocation growth.
Notice the last line of the benchmark:
julia> @benchmark f(X, beta)
BenchmarkTools.Trial: 8011 samples with 1 evaluation.
Range (min … max): 356.013 μs … 4.076 ms ┊ GC (min … max): 0.00% … 87.21%
Time (median): 457.231 μs ┊ GC (median): 0.00%
Time (mean ± σ): 615.235 μs ± 351.236 μs ┊ GC (mean ± σ): 6.54% ± 11.69%
▃▇██▆▅▄▄▄▃▂ ▂▃▄▄▄▃▁▁▁ ▁ ▁ ▂
████████████▆▇▅▅▄▄▇███████████████▇█▇▇█▇▆▇█▆▆▆▆▅▆▅▅▄▃▃▂▂▄▂▄▄▄ █
356 μs Histogram: log(frequency) by time 1.96 ms <
Memory estimate: 1.60 MiB, allocs estimate: 17.
julia> @benchmark g(X1, beta)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 174.348 μs … 2.493 ms ┊ GC (min … max): 0.00% … 83.85%
Time (median): 219.383 μs ┊ GC (median): 0.00%
Time (mean ± σ): 245.192 μs ± 119.612 μs ┊ GC (mean ± σ): 3.54% ± 7.68%
▃▇█▂
▅████▆▄▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
174 μs Histogram: frequency by time 974 μs <
Memory estimate: 469.25 KiB, allocs estimate: 11.
To avoid it, just use the macro @views
that forces a reference, without allocation. The time between the two implementations then becomes the same out of random noises:
function fbis(X::Array{Int,3}, beta::Vector{Float64})::Array{Float64}
@views hcat([ X[:,:,i] * beta for i=1:size(X,3)]...)
end
julia> @benchmark fbis(X, beta)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 175.984 μs … 2.710 ms ┊ GC (min … max): 0.00% … 79.70%
Time (median): 225.990 μs ┊ GC (median): 0.00%
Time (mean ± σ): 274.611 μs ± 166.015 μs ┊ GC (mean ± σ): 4.17% ± 7.78%
▅█▃
▆███▆▄▃▄▄▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
176 μs Histogram: frequency by time 1.15 ms <
Memory estimate: 469.25 KiB, allocs estimate: 11.
While in this case using references improves the benchmarks, put attention not to abuse them. If you are "creating" some matrix that you are going to use over and over again, the initial allocation time if you copy the matrix may become negligible compared to the lookup time of different memory spaces if you use the reference way.
Upvotes: 3