optimize a nested for loop in R, using apply?

Question

I would like optimize the following nested for loop, which iterates over select set of columns in a dataframe and checks if it corresponds to values/strings from a vector:

Positions=c()
for (col in vectorCols ) {
    for (code in vectorCodes ){
      Positions<- c(Positions,which(as.numeric(df[,col])==code))
    }

The dataframe is quite big, with 800,000 rows. vectorCodes can be 100 items long and about 20 selected columns (out of 2000).

i also tried something like the following, but it didn't help

  FunctionGrepCol<-function(col){
    Positions <- unlist( lapply( vectorCodes , function(x) (Positions,which(as.numeric(df[,col])==x)) ) )
  }
  Positions <-unlist(lapply(vectorCols, FunctionGrepCol ))

is there a way to combine put the nested for loop in an apply function to optimize it?

akuiper · Accepted Answer

You can try this solution, instead of looping through the columns names and subsetting, you can subset your data frame firstly and then loop through it just like a list.

Positions <- unlist(lapply(df[, vectorCols], function(col) which(col %in% vectorCodes)))

optimize a nested for loop in R, using apply?

Answers (1)

Related Questions