Reputation: 685
I have the following dataframe
| Document | CatA | CatB | CatC | CatD |
|----------|------|------|------|------|
| A | 1 | 0 | 1 | 1 |
| B | 0 | 1 | 1 | 0 |
| C | 1 | 1 | 0 | 1 |
indicating that categories CatA, CatC, and CatD co-occur in document A, etc.
I need to calculate the categories co-occurrence matrix over all documents, for example, as follow:
| | CatA | CatB | CatC | CatD |
|------|------|------|------|------|
| CatA | NA | 1 | 1 | 2 |
| CatB | 1 | NA | 1 | 1 |
| CatC | 1 | 1 | NA | 1 |
| CatD | 2 | 1 | 1 | NA |
Upvotes: 0
Views: 314
Reputation: 6171
If your dataframe only contains zeros and ones then you can generate the co-occurrence matrix directly in base R using the crossprod()
function:
x <- cbind(c(1,0,1), c(0, 1, 1), c(1,1,0), c(1,0,1))
crossprod(x)
which produces
[,1] [,2] [,3] [,4]
[1,] 2 1 1 2
[2,] 1 2 1 1
[3,] 1 1 2 1
[4,] 2 1 1 2
The diagonal can then be set to NA
using
res <- crossprod(x)
diag(res) <- NA
res
[,1] [,2] [,3] [,4]
[1,] NA 1 1 2
[2,] 1 NA 1 1
[3,] 1 1 NA 1
[4,] 2 1 1 NA
Upvotes: 2