Reputation: 105
I am interested in calculating quantity:
where x_i is a 1xD vector (one out of my N data of dimension D), μ is a DxK matrix and W is a list of K DxD matrices.
This should result in a 1XK vector. I try it for all N and K in the following way that works:
res = zeros(N,K);
for i in 1:N
for k in 1:K
res[i,k] = (x_matrix[i,:]-mus_matrix[:,k])'*
w_matrix[k]*(x_matrix[i,:]-mus_matrix[:,k])
If I try to vectorize it, using the following:
res = zeros(N,K);
for i in 1:N
res[i,:] = (x_matrix[i,:].-mus_matrix)'.*w_matrix.*(x_matrix[i,:].-mus_matrix)
I get the following error:
ERROR: DimensionMismatch("arrays could not be broadcast to a common size")
Stacktrace:
[1] _bcs1(::Base.OneTo{Int64}, ::Base.OneTo{Int64}) at ./broadcast.jl:70
[2] _bcs at ./broadcast.jl:63 [inlined]
[3] broadcast_shape(::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}, ::Tuple{Base.OneTo{Int64}}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}, ::Vararg{Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},N} where N) at ./broadcast.jl:57 (repeats 3 times)
[4] broadcast_indices(::Array{Float64,2}, ::Array{Any,1}, ::Array{Float64,1}, ::Vararg{Any,N} where N) at ./broadcast.jl:53
[5] broadcast_c(::Function, ::Type{Array}, ::Array{Float64,2}, ::Array{Any,1}, ::Vararg{Any,N} where N) at ./broadcast.jl:311
[6] broadcast(::Function, ::Array{Float64,2}, ::Array{Any,1}, ::Array{Float64,1}, ::Vararg{Any,N} where N) at ./broadcast.jl:434
Here is an example:
julia> N = 5
5
julia> D=2
2
julia> K = 4
4
julia> W=[]
0-element Array{Any,1}
julia> x = rand(N,D)
5×2 Array{Float64,2}:
0.576477 0.9575
0.184454 0.660436
0.470267 0.729649
0.648879 0.782561
0.626453 0.111332
julia> mu = rand(K,D)
4×2 Array{Float64,2}:
0.989281 0.00126782
0.659106 0.66136
0.50843 0.289442
0.327962 0.523229
julia> for i in 1:K
push!(W,rand(D,D))
end
And then run
julia> (x_matrix[i,:]-mus_matrix[:,k])'*
w_matrix[k]*(x_matrix[i,:]-mus_matrix[:,k])
34649.850360744866
But with the second code
julia> (x_matrix[i,:].-mus_matrix)'.*w_matrix.*(x_matrix[i,:].-mus_matrix)
ERROR: DimensionMismatch("arrays could not be broadcast to a common size")
Stacktrace:
[1] _bcs1(::Base.OneTo{Int64}, ::Base.OneTo{Int64}) at ./broadcast.jl:70
[2] _bcs at ./broadcast.jl:63 [inlined]
[3] broadcast_shape(::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}, ::Tuple{Base.OneTo{Int64}}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}, ::Vararg{Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},N} where N) at ./broadcast.jl:57 (repeats 3 times)
[4] broadcast_indices(::Array{Float64,2}, ::Array{Any,1}, ::Array{Float64,1}, ::Vararg{Any,N} where N) at ./broadcast.jl:53
[5] broadcast_c(::Function, ::Type{Array}, ::Array{Float64,2}, ::Array{Any,1}, ::Vararg{Any,N} where N) at ./broadcast.jl:311
[6] broadcast(::Function, ::Array{Float64,2}, ::Array{Any,1}, ::Array{Float64,1}, ::Vararg{Any,N} where N) at ./broadcast.jl:434
Upvotes: 0
Views: 161
Reputation: 20950
TL/DR: optimized variant below, but Einsum looks nicer, IMHO.
Looks like a case for using Einstein summation notation. In Julia, Einsum.jl
can do this:
julia> N = 5
5
julia> D = 3
3
julia> K = 10
10
julia> x = rand(N, D)
5×3 Array{Float64,2}:
0.587436 0.210529 0.261725
0.527269 0.457477 0.482939
0.52726 0.411209 0.138872
0.89107 0.464789 0.758392
0.885267 0.931014 0.672959
julia> μ = rand(D, K)
3×10 Array{Float64,2}:
0.280792 0.265066 0.81437 0.503377 0.0717916 … 0.275872 0.609961 0.0820088 0.0042564
0.0177643 0.0959438 0.563948 0.332433 0.088527 0.691971 0.0296638 0.604488 0.956057
0.668128 0.444816 0.74203 0.518232 0.48689 0.465067 0.117469 0.729514 0.109973
julia> W = rand(K, D, D)
10×3×3 Array{Float64,3}:
[:, :, 1] =
0.320861 0.662103 0.219234
0.780944 0.769377 0.566203
0.466207 0.428527 0.330901
0.15534 0.035435 0.346737
0.810676 0.328116 0.469505
0.676575 0.668204 0.285334
0.455551 0.211295 0.85295
0.229995 0.741487 0.783361
0.0937583 0.401419 0.47032
0.956335 0.434213 0.967791
[:, :, 2] =
0.275903 0.130298 0.184485
0.941648 0.940107 0.439454
0.425292 0.252654 0.797115
0.0203406 0.594075 0.484809
0.164309 0.941597 0.455314
0.73628 0.109502 0.920664
0.906305 0.177235 0.540193
0.360038 0.0486971 0.20626
0.914357 0.699901 0.295872
0.284143 0.659117 0.291479
[:, :, 3] =
0.138311 0.921371 0.353719
0.345247 0.70865 0.246736
0.361364 0.636543 0.343837
0.752149 0.581561 0.346399
0.705888 0.24765 0.703952
0.992327 0.369668 0.109407
0.341624 0.223715 0.970667
0.762169 0.94248 0.917569
0.0367128 0.589345 0.121106
0.826602 0.692111 0.229499
julia> using Einsum
julia> @einsum r[n,k] := (x[n,i] - μ[i,k]) * W[k,i,j] * (x[n,j] - μ[j,k])
julia> r
5×10 Array{Float64,2}:
0.0176889 0.087092 0.522184 0.0417967 … -0.0430999 0.041266 -0.0596579 0.432076
0.0521066 0.364059 0.181515 0.00434307 -0.0248712 0.226976 -0.0686294 0.437169
-0.0472136 0.127803 0.458812 0.0119074 0.0391649 -0.0190299 -0.0585371 0.264379
0.468634 1.16498 -0.00263205 0.192809 0.273537 1.13787 -0.0653081 1.41321
0.749655 2.20266 0.0205068 0.420249 0.573358 1.42499 0.441232 1.67574
Which @macroexpand
s to essentially the following loops (plus preparation and bounds checking):
begin
local k
for k = 1:size(μ, 2)
begin
local n
for n = 1:size(x, 1)
begin
local s = zero(T)
begin
local j
for j = 1:size(W, 3)
begin
local i
for i = 1:size(x, 2)
s += (x[n, i] - μ[i, k]) * W[k, i, j] * (x[n, j] - μ[j, k])
end
end
end
end
r[n, k] = s
end
end
end
end
end
Now, to find something more performant, I compared a couple of variants using BenchmarkTools.jl
. You can see the full code and results on my laptop here. It shows that the Einsum variant already is in fact better than the original:
# Original:
# memory estimate: 1017.73 MiB
# allocs estimate: 3429967
# median time: 361.982 ms (15.94% GC)
# Einsum:
# memory estimate: 2.64 MiB
# allocs estimate: 76
# median time: 127.536 ms (0.00% GC)
By far the most efficient and least allocating variant is the following, which requires x = x'
and W = permutedims(W, [2, 3, 1])
(assuming you can change your representation easily):
function test_optimized!(res, x, μ, W)
z = zero(eltype(x))
for k = 1:size(μ, 1)
for n = 1:size(x, 1)
res[n, k] = z
for i = 1:size(W, 1)
for j = 1:size(W, 2)
@inbounds res[n, k] += (x[i, n] - μ[i, k]) * W[i, j, k] * (x[j, n] - μ[j, k])
end
end
end
end
end
function test_optimized(x, μ, W)
res = zeros(N, K)
test_optimized!(res, x, μ, W)
res
end
This brings us down to
# memory estimate: 2.63 MiB
# allocs estimate: 2
# median time: 521.215 μs (0.00% GC)
It uses a couple of "tricks" that can be found in the docs: filling a preallocated matrix in a separate method, accessing strides in column-major order, and using @inbounds
(although that only improves things at the order of a microsecond).
There is also TensorOperations.jl
, which I think does more intelligent things under the hood, but it fail on this:
julia> @tensor r[n,k] := (x[n,i] - μ[i,k]) * W[k,i,j] * (x[n,j] - μ[j,k])
ERROR: TensorOperations.IndexError{String}("invalid index specification: (:n, :i) to (:i, :k)")
Stacktrace:
[1] add_indices(::Tuple{Symbol,Symbol}, ::Tuple{Symbol,Symbol}) at /home/philipp/.julia/v0.6/TensorOperations/src/implementation/indices.jl:22
[2] + at /home/philipp/.julia/v0.6/TensorOperations/src/indexnotation/sum.jl:40 [inlined]
[3] -(::TensorOperations.IndexedObject{(:n, :i),:N,Array{Float64,2},Int64}, ::TensorOperations.IndexedObject{(:i, :k),:N,Array{Float64,2},Int64}) at /home/philipp/.julia/v0.6/TensorOperations/src/indexnotation/sum.jl:44
I guess that's deliberate and has to do with efficiency, see this issue.
Upvotes: 1