Reputation: 1073
I am trying to write to functions and call code for a nested for loop. The below code I could easily put as it is with for loop and my function runs too. But I am trying to avoid for loop with in my function and go for lapply. How can I create the function and its respective call code using lapply?
Code with for loop:
df <- data.frame(actual=c("reaok_oc giade_len","reaok_oc giade_len reaok_oc giade_len"),
Predicted = c("giade_len","reaok_oc giade_len reaok_oc giade_len"))
df[] <- lapply(df, as.character)
str(df)
all_acc<-NULL
for(s in 1:nrow(df)){
sub_df1<-df[s,]
actual_words<-unlist(strsplit(sub_df1$actual," "))
all_count<-0
for(g in 1:length(actual_words)){
count_len<-ifelse(grep(actual_words[g],sub_df1$Predicted),1,0)
all_count<-sum(all_count,count_len)
}
sub_acc<-all_count/length(actual_words)
all_acc<-c(all_acc,sub_acc)
}
df$trans_acc<-all_acc
sensitivity=sum(df$trans_acc)/nrow(df)
sensitivity
Here is the non-working code using lapply call code to functions:
a1 <- function(df){
sub_df1<-df[s,]
actual_words<-unlist(strsplit(sub_df1$actual," "))
all_count<-0
}
a2 <- function(df){
count_len<-ifelse(grep(actual_words[g],sub_df1$Predicted),1,0)
all_count<-sum(all_count,count_len)
sub_acc<-all_count/length(actual_words)
all_acc<-c(all_acc,sub_acc)
df$trans_acc<-all_acc
sensitivity=sum(df$trans_acc)/nrow(df)
sensitivity
}
lapply(1:nrow(df) FUN = a1, lapply(1:length(actual_words) FUN = a2, actual_words,sub_aa1))
Upvotes: 2
Views: 75
Reputation: 46886
In base R it is usually best to find solutions that are 'vectorized' (only one R function call) rather than 'iterated' (one call for each element). So for instance
for(s in 1:nrow(df)){
sub_df1<-df[s,]
actual_words<-unlist(strsplit(sub_df1$actual," "))
...
involves nrow(df)
calls to strsplit()
, but
actual <- strsplit(df$actual, " ")
involves just one but performs the same transformation.
I think also that when you say
for(g in 1:length(actual_words)){
count_len<-ifelse(grep(actual_words[g],sub_df1$Predicted),1,0)
all_count<-sum(all_count,count_len)
}
really you are just looking for exact matches between actual words and predicted words. So you could split the predicted words
predicted <- strsplit(df$Predicted, " ")
and calculate sum(actual[[1]] %in% predicted[[1]])
, and so on. Write this as a function
actual_in_predicted <- function(actual, predicted) {
sum(actual %in% predicted)
}
A 'for' loop might iterate over each element of actual and predicted
all_count <- integer()
for (i in 1:nrow(df))
all_count[[i]] <- actual_in_predicted(actual[[i]], predicted[[i]])
but it's better to use mapply()
to iterate over each element of actual
and predicted
all_count <- mapply(actual_in_predicted, actual, predicted)
Your variable all_acc
is this vector of numbers divided by the number of actual words in each comparison
all_acc <- all_count / lengths(actual)
The complete revised code uses a function to compare actual and predicted words in each row, and uses a loop to iterate over each row.
actual_in_predicted <- function(actual, predicted) {
sum(actual %in% predicted)
}
actual <- strsplit(df$actual, " ")
predicted <- strsplit(df$Predicted, " ")
all_count <- mapply(actual_in_predicted, actual, predicted)
all_acc <- all_count / lengths(actual)
df$trans_acc <- all_acc
sensitivity <- sum(df$trans_acc) / nrow(df)
Upvotes: 4
Reputation: 887901
Perhaps, we can use separate_rows
library(dplyr)
library(tidyr)
library(stringr)
df %>%
separate_rows(actual, sep="_") %>%
summarise(perc = mean(str_detect(Predicted, actual)))
# perc
#1 0.75
It can be wrapped into a function
f1 <- function(data, act, pred) {
data %>%
separate_rows({{act}}, sep="_") %>%
summarise(perc = mean(str_detect({{pred}}, {{act}})))
}
f1(df, actual, Predicted)
# perc
#1 0.75
Upvotes: 2