G Gr
G Gr

Reputation: 6080

Show rows on clustered kmeans data

Hi I was wondering when you cluster data on the figure screen is there a way to show which rows the data points belong to when you scroll over them?

enter image description here

From the picture above I was hoping there would be a way in which if I select or scroll over the points that I could tell which row it belonged to.

Here is the code:

%% dimensionality reduction 
columns = 6
[U,S,V]=svds(fulldata,columns);
%% randomly select dataset
rows = 1000;
columns = 6;

%# pick random rows
indX = randperm( size(fulldata,1) );
indX = indX(1:rows);

%# pick random columns
indY = randperm( size(fulldata,2) );
indY = indY(1:columns);

%# filter data
data = U(indX,indY);
%% apply normalization method to every cell
data = data./repmat(sqrt(sum(data.^2)),size(data,1),1);

%% generate sample data
K = 6;
numObservarations = 1000;
dimensions = 6;

%% cluster
opts = statset('MaxIter', 100, 'Display', 'iter');
[clustIDX, clusters, interClustSum, Dist] = kmeans(data, K, 'options',opts, ...
'distance','sqEuclidean', 'EmptyAction','singleton', 'replicates',3);

%% plot data+clusters
figure, hold on
scatter3(data(:,1),data(:,2),data(:,3), 5, clustIDX, 'filled')
scatter3(clusters(:,1),clusters(:,2),clusters(:,3), 100, (1:K)', 'filled')
hold off, xlabel('x'), ylabel('y'), zlabel('z')

%% plot clusters quality
figure
[silh,h] = silhouette(data, clustIDX);
avrgScore = mean(silh);

%% Assign data to clusters
% calculate distance (squared) of all instances to each cluster centroid
D = zeros(numObservarations, K);     % init distances
for k=1:K
%d = sum((x-y).^2).^0.5
D(:,k) = sum( ((data - repmat(clusters(k,:),numObservarations,1)).^2), 2);
end

% find  for all instances the cluster closet to it
[minDists, clusterIndices] = min(D, [], 2);

% compare it with what you expect it to be
sum(clusterIndices == clustIDX)

Or possibly an output method of the clusters data, normalized and re-organized to there original format with appedicies on the end column with which row it belonged to from the original "fulldata".

Upvotes: 2

Views: 1189

Answers (1)

Amro
Amro

Reputation: 124553

You could use the data cursors feature which displays a tooltip when you select a point from the plot. You can use a modified update function to display all sorts of information about the point selected.

Here is a working example:

function customCusrorModeDemo()
    %# data
    D = load('fisheriris');
    data = D.meas;
    [clustIdx,labels] = grp2idx(D.species);
    K = numel(labels);
    clr = hsv(K);

    %# instance indices grouped according to class
    ind = accumarray(clustIdx, 1:size(data,1), [K 1], @(x){x});

    %# plot
    %#gscatter(data(:,1), data(:,2), clustIdx, clr)
    hLine = zeros(K,1);
    for k=1:K
        hLine(k) = line(data(ind{k},1), data(ind{k},2), data(ind{k},3), ...
            'LineStyle','none', 'Color',clr(k,:), ...
            'Marker','.', 'MarkerSize',15);
    end
    xlabel('SL'), ylabel('SW'), zlabel('PL')
    legend(hLine, labels)
    view(3), box on, grid on

    %# data cursor
    hDCM = datacursormode(gcf);
    set(hDCM, 'UpdateFcn',@updateFcn, 'DisplayStyle','window')
    set(hDCM, 'Enable','on')

    %# callback function
    function txt = updateFcn(~,evt)
        hObj = get(evt,'Target');   %# line object handle
        idx = get(evt,'DataIndex'); %# index of nearest point

        %# class index of data point
        cIdx = find(hLine==hObj, 1, 'first');

        %# instance index (index into the entire data matrix)
        idx = ind{cIdx}(idx);

        %# output text
        txt = {
            sprintf('SL: %g', data(idx,1)) ;
            sprintf('SW: %g', data(idx,2)) ;
            sprintf('PL: %g', data(idx,3)) ;
            sprintf('PW: %g', data(idx,4)) ;
            sprintf('Index: %d', idx) ;
            sprintf('Class: %s', labels{clustIdx(idx)}) ;
        };
    end

end

Here is how it looks like in both 2D and 3D views (with different display styles):

screenshot_2D screenshot_3D

Upvotes: 5

Related Questions