Martijn
Martijn

Reputation: 41

Calculate unique combinations between values in multiple columns data.frame R

I have a data.frame that looks like this:

  value 1 | value 2 | value 3 | value 4
   rock   |    pop  |    N/A  |   N/A
   pop    | hip hop |    rap  |   blues
   pop    |    punk |    rock |   funk
   blues  |    punk |    rap  |   N/A

I would like to create a matrix based on the unique combinations of the values, regardless of the column they are in. Based on the above example, row 1 and 3 both have a combination of the values pop and rock. The number of columns may vary per row, but also over time as the data.frame will update frequently.

How would I create a matrix that looks something like this?

          | rock    | pop     | punk
   rock   |    0    |    2    |   1
   pop    |    2    |    0    |   0
   punk   |    1    |    0    |   1

Apologies if the question or formatting isn't clear. This is my first question on Stackoverflow.

Upvotes: 2

Views: 248

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193507

If I understand correctly, you should be able to do something like this:

ul <- sort(na.omit(unique(unlist(mydf, use.names = FALSE))))
ul
# [1] "blues"   "funk"    "hip hop" "pop"     "punk"    "rap"     "rock" 

tcrossprod(apply(mydf, 1, function(x) table(factor(x, ul))))
#         blues funk hip hop pop punk rap rock
# blues       2    0       1   1    1   2    0
# funk        0    1       0   1    1   0    1
# hip hop     1    0       1   1    0   1    0
# pop         1    1       1   3    1   1    2
# punk        1    1       0   1    2   1    1
# rap         2    0       1   1    1   2    0
# rock        0    1       0   2    1   0    2

You can set the diagonal to "0" if required.

Sample data:

mydf <- structure(list(value.1 = c("rock", "pop", "pop", "blues"), value.2 = c("pop", 
    "hip hop", "punk", "punk"), value.3 = c(NA, "rap", "rock", "rap"
    ), value.4 = c(NA, "blues", "funk", NA)), .Names = c("value.1", 
    "value.2", "value.3", "value.4"), row.names = c(NA, 4L), class = "data.frame")

Upvotes: 1

Related Questions