Reputation: 1017
From the z dataframe below I would like to generate a list of new dataframes in which each dataframe is based on a unique combination of 3 different values of Name vector. In the dataframe below I have 5 different names in the Name column (they have multiple replicate entries). The number of different combination of 3 is 10 (when sampled without replacement and order is not important). Though selection is based on the Name vector, I would like the new dataframes to include the other columns of information. In this case the Vote column. I also would like that besides these two columns each dataframe contains two additional columns with the rows that were not selected. For example, if you run the code below the first dataframe in combdf will contain John, Lee, Susan and their votes. But I could not find/figure out how to add to that dataframe two columns for the remaining two names and their votes, and so forth for the rest of them. These two columns will have fewer rows so I am fine with NA for the missing cells.
Name <- c("Jhon", "Lee", "Suzan", "Abhinav",
"Brain")
Vote <- letters[1:21]
z <- as.data.frame (cbind(Name, Vote))
comb<-combn(unique(as.character(z$Name)), 3)
combdf <- apply(comb, 2, function(vec) z[ z$Name %in% vec, ] )
Upvotes: 1
Views: 393
Reputation: 886948
One easier option is to make use of bind_rows
with the filter
ed having the 'vec' in 'Name' and a second data without having those, rename it so that it creates new columns filled with NA
library(dplyr)
out <- z %>%
pull(Name) %>%
unique %>%
combn(., 3, FUN = function(vec)
z %>%
filter(Name %in% vec) %>%
bind_rows(z %>%
filter(!Name %in% vec) %>%
rename(Name2 = Name, Vote2 = Vote)), simplify = FALSE)
-output
out[[1]]
# Name Vote Name2 Vote2
#1 Jhon a <NA> <NA>
#2 Lee b <NA> <NA>
#3 Suzan c <NA> <NA>
#4 Jhon f <NA> <NA>
#5 Lee g <NA> <NA>
#6 Suzan h <NA> <NA>
#7 Jhon k <NA> <NA>
#8 Lee l <NA> <NA>
#9 Suzan m <NA> <NA>
#10 Jhon p <NA> <NA>
#11 Lee q <NA> <NA>
#12 Suzan r <NA> <NA>
#13 Jhon u <NA> <NA>
#14 <NA> <NA> Abhinav d
#15 <NA> <NA> Brain e
#16 <NA> <NA> Abhinav i
#17 <NA> <NA> Brain j
#18 <NA> <NA> Abhinav n
#19 <NA> <NA> Brain o
#20 <NA> <NA> Abhinav s
#21 <NA> <NA> Brain t
Also, if we need to have NA
at the bottom
out2 <- z %>%
pull(Name) %>%
unique %>%
combn(., 3, FUN = function(vec)
z %>%
filter(Name %in% vec) %>%
bind_rows(z %>%
filter(!Name %in% vec) %>%
rename(Name2 = Name, Vote2 = Vote)) %>%
mutate(across(c(Name2, Vote2),
~ .[order(is.na(.))])), simplify = FALSE)
out2[[1]]
# Name Vote Name2 Vote2
#1 Jhon a Abhinav d
#2 Lee b Brain e
#3 Suzan c Abhinav i
#4 Jhon f Brain j
#5 Lee g Abhinav n
#6 Suzan h Brain o
#7 Jhon k Abhinav s
#8 Lee l Brain t
#9 Suzan m <NA> <NA>
#10 Jhon p <NA> <NA>
#11 Lee q <NA> <NA>
#12 Suzan r <NA> <NA>
#13 Jhon u <NA> <NA>
#14 <NA> <NA> <NA> <NA>
#15 <NA> <NA> <NA> <NA>
#16 <NA> <NA> <NA> <NA>
#17 <NA> <NA> <NA> <NA>
#18 <NA> <NA> <NA> <NA>
#19 <NA> <NA> <NA> <NA>
#20 <NA> <NA> <NA> <NA>
#21 <NA> <NA> <NA> <NA>
Or can also use setdiff/anti_join
from dplyr
out <- z %>%
pull(Name) %>%
unique %>%
combn(., 3, FUN = function(vec) {
z1 <- z %>%
filter(Name %in% vec)
z2 <- setdiff(z, z1)
names(z2) <- paste0(names(z2), 2)
bind_rows(z1, z2)
}, simplify = FALSE)
Upvotes: 1
Reputation: 1234
f <- function(df,n)
{ # creates n NA rows
naDF = df[1,]
naDF[1,] <- NA
naDF[rep(seq_len(nrow(naDF)), each = n), ]
}
# previous code unchanged
df <- lapply(1:dim(comb)[2], function(x) {df1 = z[ z$Name %in% comb[,x], ]; df2 = z[ !z$Name %in% comb[,x], ]; cbind(df1, rbind(df2, f(df2, nrow(df1)-nrow(df2))))})
> df[[1]]
Name Vote Name Vote
1 Jhon a Abhinav d
2 Lee b Brain e
3 Suzan c Abhinav i
6 Jhon f Brain j
7 Lee g Abhinav n
8 Suzan h Brain o
11 Jhon k Abhinav s
12 Lee l Brain t
13 Suzan m <NA> <NA>
16 Jhon p <NA> <NA>
17 Lee q <NA> <NA>
18 Suzan r <NA> <NA>
21 Jhon u <NA> <NA>
Upvotes: 1