How to count common concepts and store the result in a matrix?

Question

I want/need to create a matrix of 1's and 0's that contains the information about common terms. I created a matrix of common terms between columns (e.g. with rows like 1,4,2) but I do not figure out how to disaggregate it.

Here is a toy and reproducible example. Steps (1)-(4) are ok and step (5) is what I cannot do at the moment.

(1) I have this (fictional) dataset

vec1 <- c("apple","pear","apple and pear")
vec2 <- c("apple and pear","banana","orange")
vec3 <- c("orange and pear","banana","apple")

my.data.frame <- as.data.frame(cbind(vec1,vec2,vec3))

            vec1           vec2            vec3
1          apple apple and pear orange and pear
2           pear         banana          banana
3 apple and pear         orange           apple

(2) I extract the variables and the content

vectors.list <- as.vector(colnames(my.data.frame))

list.of.fruits <- unique(as.vector(unlist(my.data.frame)))

(2) I write down a function to count common terms (this is a deformation of this post: How to count common words and store the result in a matrix?)

common.fruits <- function(vList) {
  v <- lapply(vList, tolower)
  do.call(rbind, lapply(v, function(x) {
    do.call(c, lapply(v, function(y) length(intersect(x, y))))
  }))
}

(4) I use get and lapply to do some efficient (I guess) calculation

compare <- lapply(vectors.list,get)
common.terms.matrix <- common.fruits(compare)
rownames(common.terms.matrix) <- vectors.list
colnames(common.terms.matrix) <- vectors.list
common.terms.matrix

     vec1 vec2 vec3
vec1    3    1    1
vec2    1    3    1
vec3    1    1    3

(5) How do I disaggregate that last matrix into this matrix or data.frame (the "|" are to indicate that this was written by hand)

     | apple | pear | apple and pear | banana | orange | orange and pear
vec1 | 1     | 1    | 1              | 0      | 0      | 0
vec2 | 0     | 0    | 1              | 1      | 1      | 0
vec3 | 1     | 0    | 0              | 1      | 0      | 1

Roman · Accepted Answer

You can try

table(col(my.data.frame), as.matrix(my.data.frame))
       apple apple and pear banana orange orange and pear pear
  1     1              1      0      0               0    1
  2     0              1      1      1               0    0
  3     1              0      1      0               1    0

Then you can order the output using sort or so.

How to count common concepts and store the result in a matrix?

Answers (2)

Related Questions