Reputation: 1555
I am looping on 2 data dataframes in R, the first dataframe contains list of words and the second dataframe contains list of paragraphs.
My task is to compute the number of words from dataframe1 that are exiting in each paragraph, and store them on "Count" Column on dataframe2
I did it using nested for loops, but it takes huge amount of time.
Is there any way to enhance the performance of the below code?
for (i in 1:nrow(words))
{
for(j in 1:nrow(paragraphs))
{
if(grepl(words[i,1],paragraphs[j,1]))
{
paragraphs[j,"count"]=paragraphs[j,"count"]+1
}
}
}
Updated:
Sample Words:
afflict
afraid
aggressive
annoying
Sample Paragraphs:
the phone will be too tall and bulky and annoying to tolerate use
the only thing that i am afraid of if bernie sanders is president he may die in office
Bernie Sanders may care about the issues, but does he really understand them? That is the question
Upvotes: 0
Views: 64
Reputation: 12559
You can do:
words <- data.frame(w=c("afflict", "afraid", "aggressive", "annoying"))
paragraphs <- data.frame(p=c("the phone will be too tall and bulky and annoying to tolerate use",
"the only thing that i am afraid of if bernie sanders is president he may die in office",
"Bernie Sanders may care about the issues, but does he really understand them? That is the question"))
paragraphs$count <- rowSums(sapply(words[,1], grepl, x=paragraphs[,1], fixed=TRUE))
Upvotes: 1