Reputation: 14977
I'm new to Matlab, and I want to do the following.
I have 2500 data points that can be clustered into 10 groups. My aim is to find the top 5 data points of each cluster that is closest to the centroid. To do that, I did the following.
1) Find the distance between each point to each centroid, and allocate the closest cluster to each data point.
2) Store the data point's index (1,...,2500) and the corresponding distance in a cluster{index} array (not sure what data type this should be), where index = 1,2,...,10.
3) Go through each cluster to find the 5 closest data points.
My problem is I don't know how many data points will be stored in each cluster, so I don't know which data type I should use for my clusters and how to add to them in Step 2. I think a cell array may be what I need, but then I'll need one for the data point index and one for the distance. Or can I create a cell array of structure (each structure consisting of 2 members - index and distance). Again, how could I dynamically add to each cluster then?
Upvotes: 0
Views: 125
Reputation: 1275
I would suggest you keep the data in an normal array, this usually works the quickest in Matlab.
You could do as follows: (assuming p
is an n=2500
by dim
matrix of data points, and c
is an m=10
by dim
matrix of centroids):
dists = zeros(n,m);
for i = 1:m
dists(:,i) = sqrt(sum(bsxfun(@minus,p,c(i,:)).^2,2));
end
[mindists,groups] = min(dists,[],2);
orderOfClosenessInGroup = zeros(size(groups));
for i = 1:m
[~,permutation] = sort(mindists(groups==i));
[~,orderOfClosenessInGroup(groups==i)] = sort(permutation);
end
Then groups
will be an n
by 1
matrix of values 1
to m
telling you which centroid the corresponding data point is closest to, and orderOfClosenessInGroup
is an n
by 1
matrix telling you the order of closeness inside each group (orderOfClosenessInGroup <= 5
will give you a logical vector of which data points are among the 5 closest to their centroid in their group). To illustrate it, try the following example:
n = 2500;
m = 10;
dim = 2;
c = rand(m,dim);
p = rand(n,dim);
Then run the above code, and finally plot the data as follows:
scatter(p(:,1),p(:,2),100./orderOfClosenessInGroup,[0,0,1],'x');hold on;scatter(c(:,1),c(:,2),50,[1,0,0],'o');
figure;scatter(p(orderOfClosenessInGroup<=5,1),p(orderOfClosenessInGroup<=5,2),50,[0,0,1],'x');hold on;scatter(c(:,1),c(:,2),50,[1,0,0],'o');
This will give you a result looking something like this:
and this:
Upvotes: 2