Conor
Conor

Reputation: 731

Julia: Broadcasting pairwise distance calculation across tensor of observations

I am trying to use the Distances package in Julia to perform a broadcasted computation of distance matrices.

I understand how to calculate a single N x N distance matrix for some matrix X (with dimensions D x N), where each column X[:,i] stores a D-dimensional feature vector for observation i. The code would be:

using Distances

dist_matrix = pairwise(Euclidean(), X, dims = 2)

dist_matrix contains the Euclidean distances between each pair of D-dimensional columns, e.g. dist_matrix[m,n] stores the Euclidean distance between X[:,m] and X[:,n].

Now imagine my array X is actually an entire tensor or 'volume' of D-dimensional observations, so that X[:,i,j] stores the j-th 'slice' of my D x N observations. The whole array X thus has dimensions D x N x T, where T is the number of slices.

Accordingly, I want to compute a tensor or 'volume' of distance matrices, so that dist_matrix will have dimensions N x N x T.

Is there a way to do this in a single line by broadcasting the pairwise() function in Julia? What's the fastest way to do this? Below shows the idea with a basic for loop:

using Distances

dist_matrix_tensor = zeros(N,N,T);

for t = 1:T
        dist_matrix_tensor[:,:,t] = pairwise(Euclidean(), X[:,:,t], dims = 2)
end

EDIT: I figured out how to do this using mapslices, but still not sure if it's the best way.

using Distances

dist_function(x)  = pairwise(Euclidean(), x, dims = 2) # define a function that gets the N x N distance matrix for a single 'slice'

dist_matrix_tensor = mapslices(dist_function, X, dims = [1,2]) # map your matrix-operating function across the slices of the main tensor X

This can of course also be parallelized since each 'slice' of X is independent in this computation, so I'm basically just looking for the fastest way to do this. I'm also in general interested in how you would do this with broadcasting specifically.

Upvotes: 2

Views: 601

Answers (1)

Fredrik Bagge
Fredrik Bagge

Reputation: 1381

Your solution with mapslices is reasonably performant if the dimension of X is large. Below is an example with JuliennedArrays which is slightly faster for small X, but has the same performance as mapslices when the two first dimensions are of size 100.

using Distances, JuliennedArrays, BenchmarkTools

dist_function(x)  = pairwise(Euclidean(), x, dims = 2) # define a function that gets the N x N distance matrix for a single 'slice'

X = randn(10,10,20);
dist_matrix_tensor = @btime mapslices(dist_function, X, dims = [1,2]); # 61.172 μs (198 allocations: 42.28 KiB)
dist_matrix_tensor2 = @btime map(dist_function, Slices(X, 1, 2)); # 41.529 μs (62 allocations: 21.67 KiB)

Note, however, that JuliennedArrays returns a Vector of Matrix rather than a three-dimensional array.

Upvotes: 2

Related Questions