Nobel
Nobel

Reputation: 1555

Enhance nested loops performance in R

I am looping on 2 data dataframes in R, the first dataframe contains list of words and the second dataframe contains list of paragraphs.

My task is to compute the number of words from dataframe1 that are exiting in each paragraph, and store them on "Count" Column on dataframe2

I did it using nested for loops, but it takes huge amount of time.

Is there any way to enhance the performance of the below code?

for (i in 1:nrow(words))
{  
  for(j in 1:nrow(paragraphs))
  {
    if(grepl(words[i,1],paragraphs[j,1]))
    {
      paragraphs[j,"count"]=paragraphs[j,"count"]+1
    }
  }
}

Updated:

Sample Words:

afflict

afraid

aggressive

annoying

Sample Paragraphs:

the phone will be too tall and bulky and annoying to tolerate use

the only thing that i am afraid of if bernie sanders is president he may die in office

Bernie Sanders may care about the issues, but does he really understand them? That is the question

Upvotes: 0

Views: 64

Answers (1)

jogo
jogo

Reputation: 12559

You can do:

words <- data.frame(w=c("afflict", "afraid", "aggressive", "annoying"))
paragraphs <- data.frame(p=c("the phone will be too tall and bulky and annoying to tolerate use", 
                             "the only thing that i am afraid of if bernie sanders is president he may die in office", 
                             "Bernie Sanders may care about the issues, but does he really understand them? That is the question"))
paragraphs$count <- rowSums(sapply(words[,1], grepl, x=paragraphs[,1], fixed=TRUE))

Upvotes: 1

Related Questions