Reputation: 6080
Hi I was wondering when you cluster data on the figure screen is there a way to show which rows the data points belong to when you scroll over them?
From the picture above I was hoping there would be a way in which if I select or scroll over the points that I could tell which row it belonged to.
Here is the code:
%% dimensionality reduction
columns = 6
[U,S,V]=svds(fulldata,columns);
%% randomly select dataset
rows = 1000;
columns = 6;
%# pick random rows
indX = randperm( size(fulldata,1) );
indX = indX(1:rows);
%# pick random columns
indY = randperm( size(fulldata,2) );
indY = indY(1:columns);
%# filter data
data = U(indX,indY);
%% apply normalization method to every cell
data = data./repmat(sqrt(sum(data.^2)),size(data,1),1);
%% generate sample data
K = 6;
numObservarations = 1000;
dimensions = 6;
%% cluster
opts = statset('MaxIter', 100, 'Display', 'iter');
[clustIDX, clusters, interClustSum, Dist] = kmeans(data, K, 'options',opts, ...
'distance','sqEuclidean', 'EmptyAction','singleton', 'replicates',3);
%% plot data+clusters
figure, hold on
scatter3(data(:,1),data(:,2),data(:,3), 5, clustIDX, 'filled')
scatter3(clusters(:,1),clusters(:,2),clusters(:,3), 100, (1:K)', 'filled')
hold off, xlabel('x'), ylabel('y'), zlabel('z')
%% plot clusters quality
figure
[silh,h] = silhouette(data, clustIDX);
avrgScore = mean(silh);
%% Assign data to clusters
% calculate distance (squared) of all instances to each cluster centroid
D = zeros(numObservarations, K); % init distances
for k=1:K
%d = sum((x-y).^2).^0.5
D(:,k) = sum( ((data - repmat(clusters(k,:),numObservarations,1)).^2), 2);
end
% find for all instances the cluster closet to it
[minDists, clusterIndices] = min(D, [], 2);
% compare it with what you expect it to be
sum(clusterIndices == clustIDX)
Or possibly an output method of the clusters data, normalized and re-organized to there original format with appedicies on the end column with which row it belonged to from the original "fulldata".
Upvotes: 2
Views: 1189
Reputation: 124553
You could use the data cursors feature which displays a tooltip when you select a point from the plot. You can use a modified update function to display all sorts of information about the point selected.
Here is a working example:
function customCusrorModeDemo()
%# data
D = load('fisheriris');
data = D.meas;
[clustIdx,labels] = grp2idx(D.species);
K = numel(labels);
clr = hsv(K);
%# instance indices grouped according to class
ind = accumarray(clustIdx, 1:size(data,1), [K 1], @(x){x});
%# plot
%#gscatter(data(:,1), data(:,2), clustIdx, clr)
hLine = zeros(K,1);
for k=1:K
hLine(k) = line(data(ind{k},1), data(ind{k},2), data(ind{k},3), ...
'LineStyle','none', 'Color',clr(k,:), ...
'Marker','.', 'MarkerSize',15);
end
xlabel('SL'), ylabel('SW'), zlabel('PL')
legend(hLine, labels)
view(3), box on, grid on
%# data cursor
hDCM = datacursormode(gcf);
set(hDCM, 'UpdateFcn',@updateFcn, 'DisplayStyle','window')
set(hDCM, 'Enable','on')
%# callback function
function txt = updateFcn(~,evt)
hObj = get(evt,'Target'); %# line object handle
idx = get(evt,'DataIndex'); %# index of nearest point
%# class index of data point
cIdx = find(hLine==hObj, 1, 'first');
%# instance index (index into the entire data matrix)
idx = ind{cIdx}(idx);
%# output text
txt = {
sprintf('SL: %g', data(idx,1)) ;
sprintf('SW: %g', data(idx,2)) ;
sprintf('PL: %g', data(idx,3)) ;
sprintf('PW: %g', data(idx,4)) ;
sprintf('Index: %d', idx) ;
sprintf('Class: %s', labels{clustIdx(idx)}) ;
};
end
end
Here is how it looks like in both 2D and 3D views (with different display styles):
Upvotes: 5