Vectorizing distance to several points on Octave (Matlab)

Question

I'm writing a k-means algorithm. At each step, I want to compute the distance of my n points to k centroids, without a for loop, and for d dimensions.

The problem is I have a hard time splitting on my number of dimensions with the Matlab functions I know. Here is my current code, with x being my n 2D-points and y my k centroids (also 2D-points of course), and with the points distributed along dimension 1, and the spatial coordinates along the dimension 2:

  dist = @(a,b) (a - b).^2;

  dx = bsxfun(dist, x(:,1), y(:,1)'); % x is (n,1) and y is (1,k)
  dy = bsxfun(dist, x(:,2), y(:,2)'); % so the result is (n,k)
  
  dists = dx + dy; % contains the square distance of each points to the k centroids
  [_,l] = min(dists, [], 2); % we then argmin on the 2nd dimension

How to vectorize furthermore ?

First edit 3 days later, searching on my own

Since asking this question I made progress on my own towards vectorizing this piece of code. The code above runs in approximately 0.7 ms on my example.

I first used repmat to make it easy to do broadcasting:

dists = permute(permute(repmat(x,1,1,k), [3,2,1]) - y, [3,2,1]).^2;
dists = sum(dists, 2);
[~,l] = min(dists, [], 3);

As expected it is slightly slower since we replicate the matrix, it runs at 0.85 ms.

From this example it was pretty easy to use bsxfun for the whole thing, but it turned out to be extremely slow, running in 150 ms so more than 150 times slower than the repmat version:

dist = @(a, b) (a - b).^2;
dists = permute(bsxfun(dist, permute(x, [3, 2, 1]), y), [3, 2, 1]);
dists = sum(dists, 2);
[~,l] = min(dists, [], 3);

Why is it so slow ? Isn't vectorizing always an improvement on speed, since it uses vector instructions on the CPU ? I mean of course simple for loops could be optimized to use it aswell, but how can vectorizing make the code slower ? Did I do it wrong ?

Using a for loop

For the sake of completeness, here's the for loop version of my code, surprisingly the fastest running in 0.4 ms, not sure why..

for i=1:k
  dists(:,i) = sum((x - y(i,:)).^2, 2);
endfor
[~,l] = min(dists, [], 2);

Vectorizing distance to several points on Octave (Matlab)

First edit 3 days later, searching on my own

Using a for loop

Answers (1)

Related Questions