Forinstance
Forinstance

Reputation: 453

r - best function to select combinations of columns in dataset

Let's say I want to subset my dataset df (m rows x n cols) always taking the first column and all the possible combinations of the other columns.

df = as.data.frame(matrix(rbinom(10*1000, 1, .5), nrow = 10, ncol=5))

So far I created the following function:

Mycomb = function(elements){
  n = length(elements)
  list = c()
  for (i in 1:n){
    list = append(list,c(combn(x = elements, m = i, simplify = F)))
  }

  return(list)
}

I generate all the combinations of the columns 2:5

combinations = Mycomb(c(2,3,4,5))

and then subset the dataset in a loop, with the following code:

for (i in 1:length(combinations)){

  colOK = c(1,unlist(combinations[[i]], use.names=FALSE))
  cat("Selected columns: ", toString(colOK), "\n")
  print(df[,colOK])

}

This is the best code I could come up with, even if it does not look very clean. Is there a better way to do what I'm doing?

Upvotes: 1

Views: 416

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76460

Your code can be greatly simplified, starting with the function Mycomb.
Note that I have added an extra argument, simplify, that defaults to FALSE.

Mycomb <- function(elements, simplify = FALSE){
  result <- lapply(seq_along(elements), function(m)
    combn(elements, m, simplify = simplify))

  result
}

combinations <- Mycomb(2:5)

Now if you want all the subsets of df, use a double lapply on the result combinations.

sub_df_list <- lapply(combinations, function(inx_list)
    lapply(inx_list, function(i) df[, c(1, i)])
  )

length(sub_df_list[[1]])
#[1] 4

So the first member of the results list has a total of 4 sub data frames.

Upvotes: 1

Related Questions