Reputation: 583
I would like optimize the following nested for loop, which iterates over select set of columns in a dataframe and checks if it corresponds to values/strings from a vector:
Positions=c()
for (col in vectorCols ) {
for (code in vectorCodes ){
Positions<- c(Positions,which(as.numeric(df[,col])==code))
}
The dataframe is quite big, with 800,000 rows. vectorCodes can be 100 items long and about 20 selected columns (out of 2000).
i also tried something like the following, but it didn't help
FunctionGrepCol<-function(col){
Positions <- unlist( lapply( vectorCodes , function(x) (Positions,which(as.numeric(df[,col])==x)) ) )
}
Positions <-unlist(lapply(vectorCols, FunctionGrepCol ))
is there a way to combine put the nested for loop in an apply function to optimize it?
Upvotes: 1
Views: 469
Reputation: 214927
You can try this solution, instead of looping through the columns names and subsetting, you can subset your data frame firstly and then loop through it just like a list.
Positions <- unlist(lapply(df[, vectorCols], function(col) which(col %in% vectorCodes)))
Upvotes: 1