nik
nik

Reputation: 2584

how to make a combination based on strings

I have a data frame with many columns like below

Column1           Column2           Column3
Q9Y6Y8             P28074           Q9Y6A4
Q9Y6W5             P28066           Q9Y623
Q9Y6H1             P27695           Q9Y5W9
Q5T1J5             P25786;Q9Y623 
Q9Y6A4
Q9Y623;P27695;Q9Y623
Q9Y5W9
Q9Y6Y8

So I want to first put them all together and get their unique like below

Q9Y6Y8                        
Q9Y6W5                     
Q9Y6H1                       
Q5T1J5             
Q9Y6A4
Q9Y623
P27695
Q9Y623
Q9Y5W9
Q9Y6Y8 
P25786
P28074
P28066   

Then I want a combination of all strings two by two like below

Q9Y6Y8 Q9Y6W5   
Q9Y6Y8 Q9Y6H1                       
Q9Y6Y8 Q9Y6A4                           
Q9Y6Y8 Q5T1J5             
Q9Y6Y8 Q9Y6A4
Q9Y6Y8 Q9Y623
Q9Y6Y8 P27695
Q9Y6Y8 Q9Y623
    .
    .
    .
Q9Y6W5 Q9Y6H1
Q9Y6W5 Q9Y6A4
Q9Y6W5 Q5T1J5 
    .
    .
    .

until all strings are in paris once

Upvotes: 1

Views: 55

Answers (1)

akrun
akrun

Reputation: 886948

We can do this by unlisting the data.frame (as data.frame is a list) to a vector, split by ;, then unlist the list output (from strsplit) and get the unique elements as a vector.

Un1 <- unique(unlist(strsplit(unlist(df1), ";")))

From this, we can get all the combinations using expand.grid

expand.grid(Un1, Un1)

Or if we need only limited combinations, combn can be used.

t(combn(Un1, 2))
#        [,1]     [,2]    
# [1,] "Q9Y6Y8" "Q9Y6W5"
# [2,] "Q9Y6Y8" "Q9Y6H1"
# [3,] "Q9Y6Y8" "Q5T1J5"
# [4,] "Q9Y6Y8" "Q9Y6A4"
# [5,] "Q9Y6Y8" "Q9Y623"
# [6,] "Q9Y6Y8" "P27695"
# [7,] "Q9Y6Y8" "Q9Y5W9"
# [8,] "Q9Y6Y8" "P28074"
# [9,] "Q9Y6Y8" "P28066"
#[10,] "Q9Y6Y8" "P25786"
#[11,] "Q9Y6W5" "Q9Y6H1"
#[12,] "Q9Y6W5" "Q5T1J5"
#[13,] "Q9Y6W5" "Q9Y6A4"
#[14,] "Q9Y6W5" "Q9Y623"
#[15,] "Q9Y6W5" "P27695"
#[16,] "Q9Y6W5" "Q9Y5W9"
#[17,] "Q9Y6W5" "P28074"
#[18,] "Q9Y6W5" "P28066"
#[19,] "Q9Y6W5" "P25786"
#[20,] "Q9Y6H1" "Q5T1J5"
#[21,] "Q9Y6H1" "Q9Y6A4"
#[22,] "Q9Y6H1" "Q9Y623"
#[23,] "Q9Y6H1" "P27695"
#[24,] "Q9Y6H1" "Q9Y5W9"
#[25,] "Q9Y6H1" "P28074"
#[26,] "Q9Y6H1" "P28066"
#[27,] "Q9Y6H1" "P25786"
#[28,] "Q5T1J5" "Q9Y6A4"
#[29,] "Q5T1J5" "Q9Y623"
#[30,] "Q5T1J5" "P27695"
#[31,] "Q5T1J5" "Q9Y5W9"
#[32,] "Q5T1J5" "P28074"
#[33,] "Q5T1J5" "P28066"
#[34,] "Q5T1J5" "P25786"
#[35,] "Q9Y6A4" "Q9Y623"
#[36,] "Q9Y6A4" "P27695"
#[37,] "Q9Y6A4" "Q9Y5W9"
#[38,] "Q9Y6A4" "P28074"
#[39,] "Q9Y6A4" "P28066"
#[40,] "Q9Y6A4" "P25786"
#[41,] "Q9Y623" "P27695"
#[42,] "Q9Y623" "Q9Y5W9"
#[43,] "Q9Y623" "P28074"
#[44,] "Q9Y623" "P28066"
#[45,] "Q9Y623" "P25786"
#[46,] "P27695" "Q9Y5W9"
#[47,] "P27695" "P28074"
#[48,] "P27695" "P28066"
#[49,] "P27695" "P25786"
#[50,] "Q9Y5W9" "P28074"
#[51,] "Q9Y5W9" "P28066"
#[52,] "Q9Y5W9" "P25786"
#[53,] "P28074" "P28066"
#[54,] "P28074" "P25786"
#[55,] "P28066" "P25786"

NOTE: Here I assumed the columns are all character class.

Upvotes: 3

Related Questions