Reputation: 543
I need to perform dimensionality reduction on a multi dimensional data set data set that has been clustered using k-means. The data contains positive and negative real numbers obtained from sensor readings of sensors placed on a haptic glove. The data is captured while representing an action say drawing letter "A" as
0.1373 -1.8764
-1.7020 -0.8322
0.4862 0.8276
-0.0078 1.3597
0.9008 1.8043
2.9751 0.7125
-0.3257 0.1754
Now, my confusions are
K=3; load('b2.txt'); data = b2; numObservarations = length(data); %% cluster opts = statset('MaxIter', 500, 'Display', 'iter'); [clustIDX, clusters, interClustSum, Dist] = kmeans(data, K, 'options',opts, ... 'distance','sqEuclidean', 'EmptyAction','singleton', 'replicates',3); %% plot data+clusters figure, hold on scatter3(data(:,1),data(:,2),data(:,3), 50, clustIDX, 'filled') scatter3(clusters(:,1),clusters(:,2),clusters(:,3), 200, (1:K)', 'filled') hold off, xlabel('x'), ylabel('y'), zlabel('z')
How to rectify this?What is wrong?
After obtaining the clusters across all dimension, I now represent the data by its cluster labels as
1 1 3 2
and so on.
Upvotes: 2
Views: 1027
Reputation: 5073
The code you provide works perfectly well with slight modification for the 2D data set (two features) you provided.
Try it as follows:
data=[ 0.1373 -1.8764
-1.7020 -0.8322
0.4862 0.8276
-0.0078 1.3597
0.9008 1.8043
2.9751 0.7125
-0.3257 0.1754];
numObservarations = length(data);
K=3
%% cluster
%opts = statset('MaxIter', 500, 'Display', 'iter');
[clustIDX, clusters, interClustSum, Dist] = ...
kmeans(data, K, 'MaxIter', 500, 'Display', 'iter', ...
'distance','sqEuclidean', 'EmptyAction','singleton', 'replicates',3);
%% plot data+clusters
figure, hold on
scatter(data(:,1),data(:,2), 50, clustIDX, 'filled')
scatter(clusters(:,1),clusters(:,2), 200, (1:K)', 'filled')
hold off, xlabel('x'), ylabel('y')
This is the result:
Once again, the dataset you provided contains 2 features, so it is essentially 2D.
As far as I understand, kmeans
clusters the data, it does not by itself perform dimensionality reduction (I await anyone else reading this to correct me). For dimensionality reduction what you really want to do is PCA or similar. Following PCA you can project your data onto the principal component axis and display the clusters in a "lower dimensional" way.
I don't actually understand what you mean by temporal ordering, but I if there is some correlation between temporal events and the features you can expect kmeans
to classify (indirectly) according to those events.
Here's another example. This time the cluster size is 3. The centroids of the clusters are in variable clusters
output above by kmeans
.
The plot on the left shows the points in the 2D feature space colored according to time (the colorbar shows how the relative time relates to color). The middle figure shows what cluster points were assigned to according to a new color scale, same color scale as on the right plot which shows the position of the centroids. The point of the figure is to display the temporal regularity with which features show up.
With regard to your question about temporal ordering, it would appear that kmeans
can uncover implicit temporal correlations in the features (if that's what you mean), as shown in the following plot of clustIDX
versus time:
But I do not know how it compares to other processing algorithms (why it would be advantageous). I would head to dsp.stackexchange for a better answer.
The subplots were generated with the following code:
subplot(121);
scatter(data(:,2),data(:,3), 50, clustIDX, 'filled')
axis tight
box on
xlabel('feature 1'), ylabel('feature 2')
title('labelled points')
subplot(122);
scatter(clusters(:,2),clusters(:,3), 200, (1:K)', 'filled')
axis tight
box on
xlabel('feature 1'),ylabel('feature 2')
title('clusters')
Second plot:
figure
scatter([1:length(clustIDX)],clustIDX, 50, clustIDX, 'filled')
xlabel('time'),ylabel('cluster')
box on
axis tight
title('labelled points in time domain')
Upvotes: 2