user494461
user494461

Reputation:

how to find the closest vector to a given vector in matlab?

I have a set of n-dimensional representative vectors in matlab. I have to group vectors from a set of training vectors to groups represented by representative vectors based on proximity. How should I do it?

Upvotes: 0

Views: 3361

Answers (2)

Dan
Dan

Reputation: 45762

Iif by n-dimensional vector you mean an ordered list of n-dimensional points (that's my understanding of what you want), then I have done this in the past using the mean closest distance. Basically for each point on vector one, find the smallest distance to a point on vector two. The distance between the two vectors is then the mean of all these distances. This is however not symmetrical so you should then do the same process for each point on vector 2 finding the smallest distance to vector 1 and then aggregate the two means either with a min, max or mean etc...

Here is some code I made (for 3d vectors) using loops:

function mcd = MCD(fiber1, fiber2, option)

%

%remove NaNs
fiber1(find(isnan(fiber1),1):length(fiber1),:) = [];
fiber2(find(isnan(fiber2),1):length(fiber2),:) = [];

dist = 0;


for k = 1:length(fiber1)

    D = [];

    for j = 1:length(fiber2)
        D = [D distance(fiber1(k,:),fiber2(j,:))];
    end;

    dist = dist + min(D);

end;

mcd = dist / length(fiber1);

if nargin > 2

    dist = 0;

    for k = 1:length(fiber2)

        D = [];

        for j = 1:length(fiber1)
            D = [D distance(fiber2(k,:),fiber1(j,:))];
        end;

        dist = dist + min(D);

    end;

    mcd2 = dist / length(fiber2);

    if strcmp(option,'mean')
        mcd = mean([mcd mcd2]);
    elseif strcmp(option,'min')
        mcd = min([mcd mcd2]);
    end;
end;

but this was much too slow for me. So here is a vectorised (but difficult to follow) version that is very fast:

function mcd = MCD(fiber1, fiber2, option, sampling)

%MCD(fiber1, fiber2)
%MCD(fiber1, fiber2, option)
%MCD(fiber1, fiber2, option, sampling)



%remove NaNs
fiber1(find(isnan(fiber1),1):length(fiber1),:) = [];
fiber2(find(isnan(fiber2),1):length(fiber2),:) = [];

%sample the fibers for speed. Each fiber is represented by "sampling"
%number of points.

if nargin == 4

    freq = round(length(fiber1)/sampling);
    fiber1 = fiber1(1:freq:length(fiber1),:);
    freq = round(length(fiber2)/sampling);
    fiber2 = fiber2(1:freq:length(fiber2),:);

end;

%reshape to optimize the use of distance() for speed
FIBER2 = reshape(fiber2',[1,3,length(fiber2)]);
FIBER1 = reshape(fiber1',[1,3,length(fiber1)]); %this is only used in the symmetrical case, i.e when 'min' or 'mean' option is called


%reshape amd tile filber 1 so as to eliminate the need for two nested for
%loops thus greatly increasing the computational efficiency. The goal is to
%have a 4D matrix with 1 row and 3 columns. Dimension 3 is a smearing of
%these columns to be as long as fiber2 so that each vector (1x3) in fiber1
%can be placed "on top" as in a row above the whole of fiber2. Thus dim 3
%is as long as fiber2 and dim 4 is as long as fiber1.

fiber1 = reshape(fiber1',[1,3,length(fiber1)]); %1x3xF1
fiber1 = repmat(fiber1,[length(FIBER2),1,1]); %F2x3xF1
fiber1 = permute(fiber1,[2,1,3]); %3xF2xF1
fiber1 = reshape(fiber1,[1,3,length(FIBER2),length(FIBER1)]);%1,3,F2,F1

mcd = mean(min(distance(fiber1, repmat(FIBER2,[1,1,1,length(FIBER1)]))));

if nargin > 2

    fiber2 = reshape(fiber2',[1,3,length(fiber2)]); %1x3xF1
    fiber2 = repmat(fiber2,[length(FIBER1),1,1]); %F2x3xF1
    fiber2 = permute(fiber2,[2,1,3]); %3xF2xF1
    fiber2 = reshape(fiber2,[1,3,length(FIBER1),length(FIBER2)]);%1,3,F2,F1

    mcd2 = mean(min(distance(fiber2, repmat(FIBER1,[1,1,1,length(FIBER2)]))));

    if strcmp(option,'mean')
        mcd = mean([mcd mcd2]);
    elseif strcmp(option,'min')
        mcd = min([mcd mcd2]);
    end;
end;

This is the distance() function I used for the above, in my case I used Euclidean distances but you can adapt it to whatever is best for you, so long as it can accept two vectors:

function Edist = distance(vector1,vector2)

%distance(vector1,vector2)
%
%provides the Euclidean distance between two input vectors. Vector1 and
%vector2 must be row vectors of the same length. The number of elements in
%each vector is the dimnesionality thereof. 

Edist = sqrt(sum((diff([vector1;vector2])).^2));

Upvotes: 1

kenm
kenm

Reputation: 23985

You can use dsearchn to find which representative is closest to each point. I would recommend trying the version that doesn't involve a triangulation matrix first. If the memory or CPU performance isn't good enough, look into the triangulation stuff.

Upvotes: 4

Related Questions