jarhead
jarhead

Reputation: 1901

Cluster analysis on a 1D vector

Consider the following data:

A = [-1 -1 -1 0 1 -1 -1 0 0 1 1 1 1 -1 1 0 1];

How can the size and appearance frequency of clusters in A (of similar neighbors) be calculated, preferably using MATLAB built in commands?

The result should read something like

s_plus = [1 2 3 4 5 ; 3 0 0 1 0]'; % accounts (1,1,1,1) and (1),(1),(1) which appear in A 
s_zero = [1 2 3 4 5 ; 2 1 0 0 0]'; % accounts (0,0) and (0),(0) which appear in A
s_mins = [1 2 3 4 5 ; 1 1 1 0 0]'; % accounts (-1), (-1,-1) , and (-1,-1,-1)) which appear in A

in the above the first column indicates the cluster size and the second column is the appearance frequency.

Upvotes: 1

Views: 126

Answers (1)

Wolfie
Wolfie

Reputation: 30046

You can use run length encoding to transform your input array into two arrays

  1. The value of a group (or "run" of equal values)
  2. The number of elements in that group

Then you can covert this into your desired output by checking when two conditions are true

  1. The values array matches the value you want (-1,0,1)
  2. The group size matches 1..5

This might sound a bit tricky but it's only a few lines of code, and should be relatively fast for even large arrays because the outputs are calculated from the "encoded" arrays which will be smaller than the input array.

Here is the code, see the comments for details:

A = [-1 -1 -1 0 1 -1 -1 0 0 1 1 1 1 -1 1 0 1]; % Example input

% Run length encoding step
idx = [ find( A(1:end-1) ~= A(2:end) ), numel(A) ]; % Find group start points
count = diff([0, idx]); % Find number of elements in each group
val = A( idx );         % Get value of each group
% Helper function to go from "val" and "count" to desired output format
% by checking value = target and group size matches 1 to 5, counting matching groups. 
f = @(v) sum(val==v & count==(1:5).',2).';
% Create outputs
s_plus = f(1);  % = [3 0 0 1 0]
s_zero = f(0);  % = [2 1 0 0 0]
s_mins = f(-1); % = [1 1 1 0 0]

Upvotes: 3

Related Questions