Reputation: 731
I am trying to use the Distances
package in Julia to perform a broadcasted computation of distance matrices.
I understand how to calculate a single N x N
distance matrix for some matrix X
(with dimensions D x N
), where each column X[:,i]
stores a D
-dimensional feature vector for observation i
. The code would be:
using Distances
dist_matrix = pairwise(Euclidean(), X, dims = 2)
dist_matrix
contains the Euclidean distances between each pair of D
-dimensional columns, e.g. dist_matrix[m,n]
stores the Euclidean distance between X[:,m]
and X[:,n]
.
Now imagine my array X
is actually an entire tensor or 'volume' of D
-dimensional observations, so that X[:,i,j]
stores the j
-th 'slice' of my D x N
observations. The whole array X
thus has dimensions D x N x T
, where T
is the number of slices.
Accordingly, I want to compute a tensor or 'volume' of distance matrices, so that dist_matrix
will have dimensions N x N x T
.
Is there a way to do this in a single line by broadcasting the pairwise()
function in Julia? What's the fastest way to do this? Below shows the idea with a basic for loop:
using Distances
dist_matrix_tensor = zeros(N,N,T);
for t = 1:T
dist_matrix_tensor[:,:,t] = pairwise(Euclidean(), X[:,:,t], dims = 2)
end
EDIT:
I figured out how to do this using mapslices
, but still not sure if it's the best way.
using Distances
dist_function(x) = pairwise(Euclidean(), x, dims = 2) # define a function that gets the N x N distance matrix for a single 'slice'
dist_matrix_tensor = mapslices(dist_function, X, dims = [1,2]) # map your matrix-operating function across the slices of the main tensor X
This can of course also be parallelized since each 'slice' of X is independent in this computation, so I'm basically just looking for the fastest way to do this. I'm also in general interested in how you would do this with broadcasting specifically.
Upvotes: 2
Views: 601
Reputation: 1381
Your solution with mapslices
is reasonably performant if the dimension of X
is large. Below is an example with JuliennedArrays which is slightly faster for small X
, but has the same performance as mapslices
when the two first dimensions are of size 100.
using Distances, JuliennedArrays, BenchmarkTools
dist_function(x) = pairwise(Euclidean(), x, dims = 2) # define a function that gets the N x N distance matrix for a single 'slice'
X = randn(10,10,20);
dist_matrix_tensor = @btime mapslices(dist_function, X, dims = [1,2]); # 61.172 μs (198 allocations: 42.28 KiB)
dist_matrix_tensor2 = @btime map(dist_function, Slices(X, 1, 2)); # 41.529 μs (62 allocations: 21.67 KiB)
Note, however, that JuliennedArrays returns a Vector
of Matrix
rather than a three-dimensional array.
Upvotes: 2