Reputation: 453
Let's say I want to subset my dataset df
(m rows x n cols) always taking the first column and all the possible combinations of the other columns.
df = as.data.frame(matrix(rbinom(10*1000, 1, .5), nrow = 10, ncol=5))
So far I created the following function:
Mycomb = function(elements){
n = length(elements)
list = c()
for (i in 1:n){
list = append(list,c(combn(x = elements, m = i, simplify = F)))
}
return(list)
}
I generate all the combinations of the columns 2:5
combinations = Mycomb(c(2,3,4,5))
and then subset the dataset in a loop, with the following code:
for (i in 1:length(combinations)){
colOK = c(1,unlist(combinations[[i]], use.names=FALSE))
cat("Selected columns: ", toString(colOK), "\n")
print(df[,colOK])
}
This is the best code I could come up with, even if it does not look very clean. Is there a better way to do what I'm doing?
Upvotes: 1
Views: 416
Reputation: 76460
Your code can be greatly simplified, starting with the function Mycomb
.
Note that I have added an extra argument, simplify
, that defaults to FALSE
.
Mycomb <- function(elements, simplify = FALSE){
result <- lapply(seq_along(elements), function(m)
combn(elements, m, simplify = simplify))
result
}
combinations <- Mycomb(2:5)
Now if you want all the subsets of df
, use a double lapply
on the result combinations
.
sub_df_list <- lapply(combinations, function(inx_list)
lapply(inx_list, function(i) df[, c(1, i)])
)
length(sub_df_list[[1]])
#[1] 4
So the first member of the results list has a total of 4
sub data frames.
Upvotes: 1