Reputation: 41
I have a data.frame that looks like this:
value 1 | value 2 | value 3 | value 4
rock | pop | N/A | N/A
pop | hip hop | rap | blues
pop | punk | rock | funk
blues | punk | rap | N/A
I would like to create a matrix based on the unique combinations of the values, regardless of the column they are in. Based on the above example, row 1 and 3 both have a combination of the values pop and rock. The number of columns may vary per row, but also over time as the data.frame will update frequently.
How would I create a matrix that looks something like this?
| rock | pop | punk
rock | 0 | 2 | 1
pop | 2 | 0 | 0
punk | 1 | 0 | 1
Apologies if the question or formatting isn't clear. This is my first question on Stackoverflow.
Upvotes: 2
Views: 248
Reputation: 193507
If I understand correctly, you should be able to do something like this:
ul <- sort(na.omit(unique(unlist(mydf, use.names = FALSE))))
ul
# [1] "blues" "funk" "hip hop" "pop" "punk" "rap" "rock"
tcrossprod(apply(mydf, 1, function(x) table(factor(x, ul))))
# blues funk hip hop pop punk rap rock
# blues 2 0 1 1 1 2 0
# funk 0 1 0 1 1 0 1
# hip hop 1 0 1 1 0 1 0
# pop 1 1 1 3 1 1 2
# punk 1 1 0 1 2 1 1
# rap 2 0 1 1 1 2 0
# rock 0 1 0 2 1 0 2
You can set the diag
onal to "0" if required.
Sample data:
mydf <- structure(list(value.1 = c("rock", "pop", "pop", "blues"), value.2 = c("pop",
"hip hop", "punk", "punk"), value.3 = c(NA, "rap", "rock", "rap"
), value.4 = c(NA, "blues", "funk", NA)), .Names = c("value.1",
"value.2", "value.3", "value.4"), row.names = c(NA, 4L), class = "data.frame")
Upvotes: 1