nogbad
nogbad

Reputation: 445

R - Count exact matches in string from list of words, then calculate overall sentiment using score per word

I have a dataset containing a column of strings from which I wish to calculate an overall sentiment score, and a data frame containing all the unique words that appear in all the strings , each of which is assigned a score:

library(stringr)

df <- data.frame(text = c('recommend good value no problem','terrible quality no good','good service excellent quality commend'), score = 0)

words <- c('recommend','good','value','problem','terrible','quality','service','excellent','commend')
scores <- c(1,2,1,-2,-3,1,0,3,1)
wordsdf <- data.frame(words,scores)

The only way I have been able to get close to this is by using a nested for loop and the str_count function from the stringr package:

for (i in 1:3){
  count = 0
  for (j in 1:9){
    count <- count + (str_count(df$text[i],as.character(wordsdf$words[j])) * wordsdf$scores[j])
  }
  df$score[i] <- count
}

This almost achieves what I want:

                                    text score
1        recommend good value no problem     3
2               terrible quality no good     0
3 good service excellent quality commend     7

However, since the word 'commend' is also contained in the word 'recommend', my code calculates the scores as if both words are contained in the string.

So I have two queries: 1 - Is there a way to get it to match only to exact words? 2 - Is there a way to achieve this without using the nested loop?

Upvotes: 1

Views: 449

Answers (1)

tmfmnk
tmfmnk

Reputation: 39858

One tidyverse possibility could be:

df %>%
 rowid_to_column() %>%
 mutate(text = strsplit(text, " ", fixed = TRUE)) %>%
 unnest() %>%
 full_join(wordsdf, by = c("text" = "words")) %>%
 group_by(rowid) %>%
 summarise(text = paste(text, collapse = " "),
           scores = sum(scores, na.rm = TRUE)) %>%
 ungroup() %>%
 select(-rowid)

  text                                   scores
  <chr>                                   <dbl>
1 recommend good value no problem             2
2 terrible quality no good                    0
3 good service excellent quality commend      7

It, first, splits the "text" column into separate words. Second, it performs a full join on these words. Finally, it combines the words from "text" column again and performs the summation.

Upvotes: 3

Related Questions