Nested for loop to functions and lapply

Question

I am trying to write to functions and call code for a nested for loop. The below code I could easily put as it is with for loop and my function runs too. But I am trying to avoid for loop with in my function and go for lapply. How can I create the function and its respective call code using lapply?

Code with for loop:

df <- data.frame(actual=c("reaok_oc giade_len","reaok_oc giade_len reaok_oc giade_len"),
                  Predicted = c("giade_len","reaok_oc giade_len reaok_oc giade_len"))

df[] <- lapply(df, as.character)
str(df)

all_acc<-NULL
for(s in 1:nrow(df)){
  sub_df1<-df[s,]
  actual_words<-unlist(strsplit(sub_df1$actual," "))
  all_count<-0
  for(g in 1:length(actual_words)){
    count_len<-ifelse(grep(actual_words[g],sub_df1$Predicted),1,0)
    all_count<-sum(all_count,count_len)
  }
  sub_acc<-all_count/length(actual_words)
  all_acc<-c(all_acc,sub_acc)
}

df$trans_acc<-all_acc
sensitivity=sum(df$trans_acc)/nrow(df)
sensitivity

Here is the non-working code using lapply call code to functions:


a1 <- function(df){
  sub_df1<-df[s,]
  actual_words<-unlist(strsplit(sub_df1$actual," "))
  all_count<-0
}

a2 <- function(df){
  count_len<-ifelse(grep(actual_words[g],sub_df1$Predicted),1,0)
  all_count<-sum(all_count,count_len)
  sub_acc<-all_count/length(actual_words)
  all_acc<-c(all_acc,sub_acc)
df$trans_acc<-all_acc
sensitivity=sum(df$trans_acc)/nrow(df)
sensitivity
}


lapply(1:nrow(df) FUN = a1, lapply(1:length(actual_words) FUN = a2, actual_words,sub_aa1))

Martin Morgan · Accepted Answer

In base R it is usually best to find solutions that are 'vectorized' (only one R function call) rather than 'iterated' (one call for each element). So for instance

for(s in 1:nrow(df)){
    sub_df1<-df[s,]
    actual_words<-unlist(strsplit(sub_df1$actual," "))
    ...

involves nrow(df) calls to strsplit(), but

actual <- strsplit(df$actual, " ")

involves just one but performs the same transformation.

I think also that when you say

    for(g in 1:length(actual_words)){
        count_len<-ifelse(grep(actual_words[g],sub_df1$Predicted),1,0)
        all_count<-sum(all_count,count_len)
    }

really you are just looking for exact matches between actual words and predicted words. So you could split the predicted words

predicted <- strsplit(df$Predicted, " ")

and calculate sum(actual[[1]] %in% predicted[[1]]), and so on. Write this as a function

actual_in_predicted <- function(actual, predicted) {
    sum(actual %in% predicted)
}

A 'for' loop might iterate over each element of actual and predicted

all_count <- integer()
for (i in 1:nrow(df))
    all_count[[i]] <- actual_in_predicted(actual[[i]], predicted[[i]])

but it's better to use mapply() to iterate over each element of actual and predicted

all_count <- mapply(actual_in_predicted, actual, predicted)

Your variable all_acc is this vector of numbers divided by the number of actual words in each comparison

all_acc <- all_count / lengths(actual)

The complete revised code uses a function to compare actual and predicted words in each row, and uses a loop to iterate over each row.

actual_in_predicted <- function(actual, predicted) {
    sum(actual %in% predicted)
}

actual <- strsplit(df$actual, " ")
predicted <- strsplit(df$Predicted, " ")

all_count <- mapply(actual_in_predicted, actual, predicted)
all_acc <- all_count / lengths(actual)

df$trans_acc <- all_acc
sensitivity <- sum(df$trans_acc) / nrow(df)

Nested for loop to functions and lapply

Answers (2)

Related Questions