user1737564
user1737564

Reputation: 53

Populating a co-occurrence matrix

I am looking for a fast and efficient way to populate a co-occurrence matrix(so as to say). Here is a sample of the data I am working with:

col1 col2
a e    
a f    
a e    
b f    
c g    
a e    
d f    
a e    
a g    
b e    
c e

And I want a matrix of the following form:

... e...  f...  g    
a    
b    
c    
d

with the corresponding entry relating to the frequency.

For example, element (3,1) in the matrix would correspond to frequency of the co-occurrence of (c,e) and should have a value of 1 and that of (1,1) should have a value 3 corresponding to 3 entries of (a,e) in the dataset.

I am currently individually calculating the items using two for loops and it takes an extremely long time to compute the matrix (the actual data has about a million rows).

Upvotes: 2

Views: 1768

Answers (2)

Sven Hohenstein
Sven Hohenstein

Reputation: 81683

This is a solution in R with table:

df <- read.table(text="col1 col2
a e    
a f    
a e    
b f    
c g    
a e    
d f    
a e    
a g    
b e    
c e", header = TRUE)

table(df)

    col2
col1 e f g
   a 4 1 1
   b 1 1 0
   c 1 0 1
   d 0 1 0

Upvotes: 3

angainor
angainor

Reputation: 11810

You can use sparse to do exactly what you need:

spA = sparse(data(:,1), data(:,2), 1);

where data is your data, but as numbers. So you first have to convert alphabetic characters to doubles.

Sparse assembles row/column pairs from data(:,1) and data(:,2) adding 1 for every occurance of a pair. Note however that if you expect the matrix to be symmetric, you might need to sum spA and its transpose, depending on your data.

Upvotes: 1

Related Questions