Wordcloud from Data Table in R

Question

I have a data table made from positive and negative word associations. I would like to create two wordclouds, one for positive words and one for negative words.

Example of sentiment_words table:

          element_id    sentence_id   negative     positive
1115:          1        1115          limits       agree,available
1116:          1        1116          slow         strongly,agree
1117:          1        1117                       management
1118:          1        1118                                      
1119:          1        1119          concerns     strongly,agree,better,

I am using library(wordcloud) and library(sentimentr)

For example, how do I pull only the words from the "positive" column to create a wordcloud? I'm not sure how to address the fact that there are multiple words associated with each row (e.g., "agree, available" should be treated as two entries)

I've made different attempts at the wordcloud() function such as wordcloud(words = sentiment_words$positive, freq = 3, min.freq = 1, max.words = 200, random.order = FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2")) but this only returns a cloud with the term in the first entry

Edit: I've tried the tidyverse answer below, and the result I get is: words n 1 " "ability"" 3 2 " "ability")" 1 3 " "acceptable")" 1 4 " "accomplish"" 1 5 " "accomplished")" 1 6 " "accountability"" 1 7 " "accountability")" 1 8 " "accountable"" 2 9 " "accountable")" 1

I've tried multiply variants of gsub() and apply to remove the extra ) and c( but haven't found anything that works yet. The result is words that should be counted together are counted separately (e.g., "acceptable" and "acceptable)" are two different words in the wordcloud)

Edit: In order to get it to work correctly, I had to first clean up my sentiment_words as suggested below.

for (j in seq(sentiment_words)) {
  sentiment_words[[j]] <- gsub("character(0)", "", sentiment_words[[j]])
  sentiment_words[[j]] <- gsub('"', "", sentiment_words[[j]])
  sentiment_words[[j]] <- gsub("c$", "", sentiment_words[[j]])
  sentiment_words[[j]] <- gsub(" ", "", sentiment_words[[j]])
  sentiment_words[[j]] <- gsub("$", "", sentiment_words[[j]])  
}

and I had to also filter out the remaining "character(0" strings within the count_words function. Note that it filters "character(0" and not "character(0)" because I removed the closing parenthesis above

filter(!!var != "character(0") %>%

Implementing the above gave the cleanest wordcloud based on polarity of text

Maurits Evers · Accepted Answer

Here is a tidyverse-based approach that should get you started. I agree with Mr_Z in that I'm not entirely clear on where the problem is.

Let's define a function that generates a data.frame with the word count based on comma-separated words in a specific column var of your source data df.

library(tidyverse)
count_words <- function(df, var) {
    var <- enquo(var)
    df %>%
        separate_rows(!!var, sep = ",") %>%
        filter(!!var != "") %>%
        group_by(!!var) %>%
        summarise(n = n()) %>%
        rename(words = !!var)
}

We can then generate word counts for the positive and negative columns

df.pos <- count_words(df, positive)
df.neg <- count_words(df, negative)

Let's inspect the data.frames

df.pos
# A tibble: 5 x 2
  words          n
        
1 agree          3
2 available      1
3 better         1
4 management     1
5 strongly       2

df.neg
# A tibble: 3 x 2
  words        n
      
1 concerns     1
2 limits       1
3 slow         1

Let's plot the word clouds

library(wordcloud)
wordcloud(words = df.pos$words, freq = df.pos$n, min.freq = 1,
          max.words = 200, random.order = FALSE, rot.per = 0.35,
          colors = brewer.pal(8, "Dark2"))

wordcloud(words = df.neg$words, freq = df.neg$n, min.freq = 1,
          max.words = 200, random.order = FALSE, rot.per = 0.35,
          colors = brewer.pal(8, "Dark2"))

Wordcloud from Data Table in R

Answers (2)

Related Questions