AnnaSchumann
AnnaSchumann

Reputation: 1271

Tallying co-incidences of numbers in columns of a matrix - MATLAB

I have a matrix (A) in the form of (much larger in reality):

205   204   201
202   208   202

How can I tally the co-incidence of numbers on a column-by-column basis and then output this to a matrix?

I'd want the final matrix to run from min(A):max(A) (or be able to specify a specific range) across the top and down the side and for it to tally co-incidences of numbers in each column. Using the above example:

    200 201 202 203 204 205 206 207 208
200  0   0   0   0   0   0   0   0   0
201  0   0   1   0   0   0   0   0   0
202  0   0   0   0   0   1   0   0   0 
203  0   0   0   0   0   0   0   0   0
204  0   0   0   0   0   0   0   0   1
205  0   0   0   0   0   0   0   0   0
206  0   0   0   0   0   0   0   0   0
207  0   0   0   0   0   0   0   0   0
208  0   0   0   0   0   0   0   0   0

(Matrix labels are not required)

Two important points: The tallying needs to be non-duplicating and occur in numerical order. For example a column containing:

205
202

Will tally this as a 202 occurring with 205 (as shown in the above matrix) but NOT 205 with 202 - the duplicate reciprocal. When deciding what number to use as the reference, it should be the smallest.

EDIT:

enter image description here

Upvotes: 5

Views: 155

Answers (3)

rayryeng
rayryeng

Reputation: 104464

What about a solution using accumarray? I would first sort each column independently, then use the first row as first dimension into the final accumulation matrix, then the second row as the second dimension into the final accumulation matrix. Something like:

limits = 200:208;
A = A(:,all(A>=min(limits)) & all(A<=max(limits))); %// Borrowed from Divakar

%// Sort the columns individually and bring down to 1-indexing
B = sort(A, 1) - limits(1) + 1;

%// Create co-occurrence matrix
C = accumarray(B.', 1, [numel(limits) numel(limits)]);

With:

A = [205   204   201
     202   208   202]

This is the output:

C =

     0     0     0     0     0     0     0     0     0
     0     0     1     0     0     0     0     0     0
     0     0     0     0     0     1     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

With duplicates (borrowed from Luis Mendo):

A = [205   204   201
     201   208   205]

Output:

C =

     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     2     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

Upvotes: 3

Luis Mendo
Luis Mendo

Reputation: 112659

sparse to the rescue!

Let your data and desired range be defined as

A = [ 205   204   201
      202   208   202 ]; %// data. Two-row matrix
limits = [200 208]; %// desired range. It needn't include all values of A

Then

lim1 = limits(1)-1;
s = limits(2)-lim1;
cols = all((A>=limits(1)) & (A<=limits(2)), 1);
B = sort(A(:,cols), 1, 'descend')-lim1;
R = full(sparse(B(2,:), B(1,:), 1, s, s));

gives

R =
     0     0     0     0     0     0     0     0     0
     0     0     1     0     0     0     0     0     0
     0     0     0     0     0     1     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

Alternatively, you can dispense with sort and use matrix addition followed by triu to obtain the same result (possibly faster):

lim1 = limits(1)-1;
s = limits(2)-lim1;
cols = all( (A>=limits(1)) & (A<=limits(2)) , 1);
R = full(sparse(A(2,cols)-lim1, A(1,cols)-lim1, 1, s, s));
R = triu(R + R.');

Both approaches handle repeated columns (up to sorting), correctly increasing their tally. For example,

A = [205   204   201
     201   208   205]

gives

R =
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     2     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

Upvotes: 4

Divakar
Divakar

Reputation: 221504

See if this is what you were after -

range1 = 200:208 %// Set the range

A = A(:,all(A>=min(range1)) & all(A<=max(range1))) %// select A with columns
                                                   %// that fall within range1
A_off = A-range1(1)+1 %// Get the offsetted indices from A

A_off_sort = sort(A_off,1) %// sort offset indices to satisfy "smallest" criteria

out = zeros(numel(range1)); %// storage for output matrix
idx = sub2ind(size(out),A_off_sort(1,:),A_off_sort(2,:)) %// get the indices to be set

unqidx = unique(idx)
out(unqidx) = histc(idx,unqidx) %// set coincidences

With

A = [205   204   201
     201   208   205]

this gets -

out =
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     2     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

Few performance-oriented tricks could be used here -

I. Replace

out = zeros(numel(range1)); 

with

out(numel(range1),numel(range1)) = 0;

II. Replace

idx = sub2ind(size(out),A_off_sort(1,:),A_off_sort(2,:))  

with

idx = (A_off_sort(2,:)-1)*numel(range1)+A_off_sort(1,:)

Upvotes: 3

Related Questions