user3043636
user3043636

Reputation: 579

K-means in Matlab

I have a Knowledge Base (KB) represented by a Matrix A=(100x15) and I have to clustering this KB into 5 cluster.

I used the code in Matlab:

idx=kmeans(A,5)

I obtained a result idx with the index of cluster for each row of matrix A.

Now I have a new vector B=(1x15) a sort of new entry and I have to clustering this new vector starting from the previous clustering obtained.

When I add the new entry B to the KB and I recall the function with C (composed by A and B)

idx1=kmeans(C,5)

I obtain a new idx1 with all results different from idx.

My scope is understand the cluster of B with respect to the cluster obtained clustering the KB.

Could you help me?

Thanks in advance.

Upvotes: 1

Views: 1177

Answers (1)

Geoff
Geoff

Reputation: 1212

It sounds like you want to compare the new data point to the already-identified clusters. I'm not sure this will always give the results you expect, but you could just compute Euclidean distances to each cluster centroid and pick the smallest.

Example

Original data, constructed so as to have four clusters:

%// original data
A=[randn(25,1),   randn(25,1);
   randn(25,1)+5, randn(25,1);
   randn(25,1)+5, randn(25,1)+5;
   randn(25,1),   randn(25,1)+5];
plot(A(:,1),A(:,2),'k.');
hold on;

100 random points

K-means clustering with K=4 clusters:

K=4;
[idx,centroids]=kmeans(A,K);
for n=1:K
    plot(A(idx==n,1),A(idx==n,2),'o');
end

Original points clustered

Note that the second output of kmeans returns the centroid coordinates for each cluster.

Random new point:

%// new point:
B=2*randn(1,2);
plot(B(1),B(2),'rx');

Distance between new point and all centroids:

dist2cent = sqrt(sum((repmat(B,[K,1])-centroids).^2,2));

Index of smallest distance:

[~,closest] = min(dist2cent);

plot([centroids(closest,1), B(1)],...
     [centroids(closest,2), B(2)],...
     'r-');

Random new point clustered by closest centroid

Upvotes: 2

Related Questions