Using “combn” to create list of dataframes for all combinations of variables selected and columns for variables not selected

Question

From the z dataframe below I would like to generate a list of new dataframes in which each dataframe is based on a unique combination of 3 different values of Name vector. In the dataframe below I have 5 different names in the Name column (they have multiple replicate entries). The number of different combination of 3 is 10 (when sampled without replacement and order is not important). Though selection is based on the Name vector, I would like the new dataframes to include the other columns of information. In this case the Vote column. I also would like that besides these two columns each dataframe contains two additional columns with the rows that were not selected. For example, if you run the code below the first dataframe in combdf will contain John, Lee, Susan and their votes. But I could not find/figure out how to add to that dataframe two columns for the remaining two names and their votes, and so forth for the rest of them. These two columns will have fewer rows so I am fine with NA for the missing cells.

Name <- c("Jhon", "Lee", "Suzan", "Abhinav",
      "Brain")
Vote <- letters[1:21]
z <- as.data.frame (cbind(Name, Vote))
comb<-combn(unique(as.character(z$Name)), 3)
combdf <- apply(comb, 2, function(vec) z[ z$Name %in% vec, ] )

akrun · Accepted Answer

One easier option is to make use of bind_rows with the filtered having the 'vec' in 'Name' and a second data without having those, rename it so that it creates new columns filled with NA

library(dplyr)
out <- z %>%
          pull(Name) %>%
          unique %>%
         combn(., 3, FUN = function(vec) 
          z %>%
           filter(Name %in% vec) %>%
           bind_rows(z %>% 
                   filter(!Name %in% vec) %>% 
                   rename(Name2 = Name, Vote2 = Vote)), simplify = FALSE)

-output

out[[1]]
#    Name Vote   Name2 Vote2
#1   Jhon    a      
#2    Lee    b      
#3  Suzan    c      
#4   Jhon    f      
#5    Lee    g      
#6  Suzan    h      
#7   Jhon    k      
#8    Lee    l      
#9  Suzan    m      
#10  Jhon    p      
#11   Lee    q      
#12 Suzan    r      
#13  Jhon    u      
#14    Abhinav     d
#15      Brain     e
#16    Abhinav     i
#17      Brain     j
#18    Abhinav     n
#19      Brain     o
#20    Abhinav     s
#21      Brain     t

Also, if we need to have NA at the bottom

out2 <- z %>%
          pull(Name) %>%
          unique %>%
         combn(., 3, FUN = function(vec) 
          z %>%
           filter(Name %in% vec) %>%
           bind_rows(z %>% 
                   filter(!Name %in% vec) %>% 
                   rename(Name2 = Name, Vote2 = Vote)) %>%
           mutate(across(c(Name2, Vote2),
             ~ .[order(is.na(.))])), simplify = FALSE)



out2[[1]]
#    Name Vote   Name2 Vote2
#1   Jhon    a Abhinav     d
#2    Lee    b   Brain     e
#3  Suzan    c Abhinav     i
#4   Jhon    f   Brain     j
#5    Lee    g Abhinav     n
#6  Suzan    h   Brain     o
#7   Jhon    k Abhinav     s
#8    Lee    l   Brain     t
#9  Suzan    m      
#10  Jhon    p      
#11   Lee    q      
#12 Suzan    r      
#13  Jhon    u      
#14         
#15         
#16         
#17         
#18         
#19         
#20         
#21

Or can also use setdiff/anti_join from dplyr

out <- z %>% 
   pull(Name) %>% 
   unique %>% 
   combn(., 3, FUN = function(vec) {
             z1 <- z %>%
                       filter(Name %in% vec)
             z2 <- setdiff(z, z1)
             names(z2) <- paste0(names(z2), 2)
             bind_rows(z1, z2)
             }, simplify = FALSE)

Using “combn” to create list of dataframes for all combinations of variables selected and columns for variables not selected

Answers (2)

Related Questions