janfabian
janfabian

Reputation: 404

Comparing two sets of vectors

I've got matrices A and B

size(A) = [n x]; size(B) = [n y];

Now I need to compare euclidian distance of each column vector of A from each column vector of B. I'm using dist method right now

Q = dist([A B]); Q = Q(1:x, x:end);

But it does also lot of needless work (like calculating distances between vectors of A and B separately).

What is the best way to calculate this?

Upvotes: 2

Views: 902

Answers (3)

Bitwise
Bitwise

Reputation: 7807

Another solution if you don't have pdist2 and which may also be faster for very large matrices is to vectorize the following mathematical fact:

||x-y||^2 = ||x||^2 + ||y||^2 - 2*dot(x,y)

where ||a|| is the L2-norm (euclidean norm) of a.

Comments:

  1. C=-2*A'*B (this is a x by y matrix) is the vectorization of the dot products.
  2. ||x-y||^2 is the square of the euclidean distance which you are looking for.

Is that enough or do you need the explicit code?

The reason this may be faster asymptotically is that you avoid doing the metric calculation for all x*y comparisons, since you are instead making the bottleneck a matrix multiplication (matrix multiplication is highly optimized in matlab). You are taking advantage of the fact that this is the euclidean distance and not just some unknown metric.

Upvotes: 0

Eitan T
Eitan T

Reputation: 32920

An alternative solution to pdist2, if you don't have the Statistics Toolbox, is to compute this manually. For example, one way to do it is:

[X, Y] = meshgrid(1:size(A, 2), 1:size(B, 2)); %// or meshgrid(1:x, 1:y)
Q = sqrt(sum((A(:, X(:)) - B(:, Y(:))) .^ 2, 1));

The indices of the columns from A and B for each value in vector Q can be obtained by computing:

[X(:), Y(:)]

where each row contains a pair of indices: the first is the column index in matrix A, and the second is the column index in matrix B.

Upvotes: 1

petrichor
petrichor

Reputation: 6579

You are looking for pdist2.

% Compute the ordinary Euclidean distance
D = pdist2(A.',B.','euclidean'); % euclidean distance

You should take the transpose of the matrices since pdist2 assumes the observations are in rows, not in columns.

Upvotes: 3

Related Questions