Populating a co-occurrence matrix

Question

I am looking for a fast and efficient way to populate a co-occurrence matrix(so as to say). Here is a sample of the data I am working with:

col1 col2
a e    
a f    
a e    
b f    
c g    
a e    
d f    
a e    
a g    
b e    
c e

And I want a matrix of the following form:

... e...  f...  g    
a    
b    
c    
d

with the corresponding entry relating to the frequency.

For example, element (3,1) in the matrix would correspond to frequency of the co-occurrence of (c,e) and should have a value of 1 and that of (1,1) should have a value 3 corresponding to 3 entries of (a,e) in the dataset.

I am currently individually calculating the items using two for loops and it takes an extremely long time to compute the matrix (the actual data has about a million rows).

angainor · Accepted Answer

You can use sparse to do exactly what you need:

spA = sparse(data(:,1), data(:,2), 1);

where data is your data, but as numbers. So you first have to convert alphabetic characters to doubles.

Sparse assembles row/column pairs from data(:,1) and data(:,2) adding 1 for every occurance of a pair. Note however that if you expect the matrix to be symmetric, you might need to sum spA and its transpose, depending on your data.

Populating a co-occurrence matrix

Answers (2)

Related Questions