Reputation: 53
I am looking for a fast and efficient way to populate a co-occurrence matrix(so as to say). Here is a sample of the data I am working with:
col1 col2
a e
a f
a e
b f
c g
a e
d f
a e
a g
b e
c e
And I want a matrix of the following form:
... e... f... g
a
b
c
d
with the corresponding entry relating to the frequency.
For example, element (3,1) in the matrix would correspond to frequency of the co-occurrence of (c,e) and should have a value of 1 and that of (1,1) should have a value 3 corresponding to 3 entries of (a,e) in the dataset.
I am currently individually calculating the items using two for loops and it takes an extremely long time to compute the matrix (the actual data has about a million rows).
Upvotes: 2
Views: 1768
Reputation: 81683
This is a solution in R with table
:
df <- read.table(text="col1 col2
a e
a f
a e
b f
c g
a e
d f
a e
a g
b e
c e", header = TRUE)
table(df)
col2
col1 e f g
a 4 1 1
b 1 1 0
c 1 0 1
d 0 1 0
Upvotes: 3
Reputation: 11810
You can use sparse
to do exactly what you need:
spA = sparse(data(:,1), data(:,2), 1);
where data
is your data, but as numbers. So you first have to convert alphabetic characters to doubles.
Sparse assembles row/column pairs from data(:,1)
and data(:,2)
adding 1 for every occurance of a pair. Note however that if you expect the matrix to be symmetric, you might need to sum spA
and its transpose, depending on your data.
Upvotes: 1