djourd1
djourd1

Reputation: 479

julia vector of matrix or 3d Array

I want to multiply several matrices (all of same dimensions) to a vector beta. I have tried two versions, to store the matrices, either as a vector of matrices, or as a tri-dimensions array.

Is it normal that the version with the vector of matrices runs faster?

using BenchmarkTools

nid =10000
npar = 5
x1 = reshape(repeat([1], nid * npar), nid,:)
x2 = reshape(repeat([2], nid * npar), nid,:)
x3 = reshape(repeat([3], nid * npar), nid,:)
X = reshape([x1 x2 x3], nid, npar, :);
X1 = [x1, x2, x3]
beta = rand(npar)



  function f(X::Array{Int,3}, beta::Vector{Float64})::Array{Float64}
     hcat([ X[:,:,i] * beta  for i=1:size(X,3)]...)
   end

   function g(X::Array{Array{Int,2},1}, beta::Vector{Float64})::Array{Float64}
  hcat([X[i] * beta for  i=1:size(X)[1]]...)
end

f(X,beta);
g(X1,beta);

@benchmark f(X, beta)
@benchmark g(X1, beta)

Results indicate that f takes almost 2x time of g. Is it a normal pattern, or I am not using the 3D Array properly?

Upvotes: 1

Views: 744

Answers (1)

Antonello
Antonello

Reputation: 6423

That's because the slice operator forces each matrix to be copied hence memory allocation growth.

Notice the last line of the benchmark:

julia> @benchmark f(X, beta)
BenchmarkTools.Trial: 8011 samples with 1 evaluation.
 Range (min … max):  356.013 μs …   4.076 ms  ┊ GC (min … max): 0.00% … 87.21%
 Time  (median):     457.231 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   615.235 μs ± 351.236 μs  ┊ GC (mean ± σ):  6.54% ± 11.69%

  ▃▇██▆▅▄▄▄▃▂        ▂▃▄▄▄▃▁▁▁ ▁   ▁                            ▂
  ████████████▆▇▅▅▄▄▇███████████████▇█▇▇█▇▆▇█▆▆▆▆▅▆▅▅▄▃▃▂▂▄▂▄▄▄ █
  356 μs        Histogram: log(frequency) by time       1.96 ms <

 Memory estimate: 1.60 MiB, allocs estimate: 17.

julia> @benchmark g(X1, beta)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  174.348 μs …   2.493 ms  ┊ GC (min … max): 0.00% … 83.85%
 Time  (median):     219.383 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   245.192 μs ± 119.612 μs  ┊ GC (mean ± σ):  3.54% ±  7.68%

   ▃▇█▂                                                          
  ▅████▆▄▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  174 μs           Histogram: frequency by time          974 μs <

 Memory estimate: 469.25 KiB, allocs estimate: 11.

To avoid it, just use the macro @views that forces a reference, without allocation. The time between the two implementations then becomes the same out of random noises:

function fbis(X::Array{Int,3}, beta::Vector{Float64})::Array{Float64}
   @views  hcat([ X[:,:,i] * beta  for i=1:size(X,3)]...)
end


julia> @benchmark fbis(X, beta)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  175.984 μs …   2.710 ms  ┊ GC (min … max): 0.00% … 79.70%
 Time  (median):     225.990 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   274.611 μs ± 166.015 μs  ┊ GC (mean ± σ):  4.17% ±  7.78%

   ▅█▃                                                           
  ▆███▆▄▃▄▄▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  176 μs           Histogram: frequency by time         1.15 ms <

 Memory estimate: 469.25 KiB, allocs estimate: 11.

While in this case using references improves the benchmarks, put attention not to abuse them. If you are "creating" some matrix that you are going to use over and over again, the initial allocation time if you copy the matrix may become negligible compared to the lookup time of different memory spaces if you use the reference way.

Upvotes: 3

Related Questions