coolhand
coolhand

Reputation: 2061

Wordcloud from Data Table in R

I have a data table made from positive and negative word associations. I would like to create two wordclouds, one for positive words and one for negative words.

Example of sentiment_words table:

          element_id    sentence_id   negative     positive
1115:          1        1115          limits       agree,available
1116:          1        1116          slow         strongly,agree
1117:          1        1117                       management
1118:          1        1118                                      
1119:          1        1119          concerns     strongly,agree,better,

I am using library(wordcloud) and library(sentimentr)

For example, how do I pull only the words from the "positive" column to create a wordcloud? I'm not sure how to address the fact that there are multiple words associated with each row (e.g., "agree, available" should be treated as two entries)

I've made different attempts at the wordcloud() function such as wordcloud(words = sentiment_words$positive, freq = 3, min.freq = 1, max.words = 200, random.order = FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2")) but this only returns a cloud with the term in the first entry

Edit: I've tried the tidyverse answer below, and the result I get is: words n <chr> <int> 1 " \"ability\"" 3 2 " \"ability\")" 1 3 " \"acceptable\")" 1 4 " \"accomplish\"" 1 5 " \"accomplished\")" 1 6 " \"accountability\"" 1 7 " \"accountability\")" 1 8 " \"accountable\"" 2 9 " \"accountable\")" 1

I've tried multiply variants of gsub() and apply to remove the extra ) and c( but haven't found anything that works yet. The result is words that should be counted together are counted separately (e.g., "acceptable" and "acceptable)" are two different words in the wordcloud)

Edit: In order to get it to work correctly, I had to first clean up my sentiment_words as suggested below.

for (j in seq(sentiment_words)) {
  sentiment_words[[j]] <- gsub("character(0)", "", sentiment_words[[j]])
  sentiment_words[[j]] <- gsub('"', "", sentiment_words[[j]])
  sentiment_words[[j]] <- gsub("c\\(", "", sentiment_words[[j]])
  sentiment_words[[j]] <- gsub(" ", "", sentiment_words[[j]])
  sentiment_words[[j]] <- gsub("\\)", "", sentiment_words[[j]])  
}

and I had to also filter out the remaining "character(0" strings within the count_words function. Note that it filters "character(0" and not "character(0)" because I removed the closing parenthesis above

filter(!!var != "character(0") %>%

Implementing the above gave the cleanest wordcloud based on polarity of text

Upvotes: 1

Views: 1315

Answers (2)

Tyler Rinker
Tyler Rinker

Reputation: 109894

I would strongly advise against using the accepted answer here as it ignores that sentimentr already returns the computed counts for you (via attributes(sentiment_words)$counts). The documentation for extract_sentiment_terms shows examples that makes this more clear (there's was room for improving the documentation about what is returned and has been added in the dev version: https://github.com/trinker/sentimentr/blob/master/R/extract_sentiment_terms.R). Below I show how to extract the counts for use in a wordcloud and some potential layouts:

library(sentimentr)
library(wordcloud)
library(data.table)

set.seed(10)
x <- get_sentences(sample(hu_liu_cannon_reviews[[2]], 1000, TRUE))
sentiment_words <- extract_sentiment_terms(x)

sentiment_counts <- attributes(sentiment_words)$counts
sentiment_counts[polarity > 0,]

par(mfrow = c(1, 3), mar = c(0, 0, 0, 0))
## Positive Words
with(
    sentiment_counts[polarity > 0,],
    wordcloud(words = words, freq = n, min.freq = 1,
          max.words = 200, random.order = FALSE, rot.per = 0.35,
          colors = brewer.pal(8, "Dark2"), scale = c(4.5, .75)
    )
)
mtext("Positive Words", side = 3, padj = 5)

## Negative Words
with(
    sentiment_counts[polarity < 0,],
    wordcloud(words = words, freq = n, min.freq = 1,
          max.words = 200, random.order = FALSE, rot.per = 0.35,
          colors = brewer.pal(8, "Dark2"), scale = c(4.5, 1)
    )
)
mtext("Negative Words", side = 3, padj = 5)

sentiment_counts[, 
    color := ifelse(polarity > 0, 'red', 
        ifelse(polarity < 0, 'blue', 'gray70')
    )]

## Together
with(
    sentiment_counts[polarity != 0,],
    wordcloud(words = words, freq = n, min.freq = 1,
          max.words = 200, random.order = FALSE, rot.per = 0.35,
          colors = color, ordered.colors = TRUE, scale = c(5, .75)
    )
)
mtext("Positive (red) & Negative (blue) Words", side = 3, padj = 5)

enter image description here

Upvotes: 0

Maurits Evers
Maurits Evers

Reputation: 50698

Here is a tidyverse-based approach that should get you started. I agree with Mr_Z in that I'm not entirely clear on where the problem is.

  1. Let's define a function that generates a data.frame with the word count based on comma-separated words in a specific column var of your source data df.

    library(tidyverse)
    count_words <- function(df, var) {
        var <- enquo(var)
        df %>%
            separate_rows(!!var, sep = ",") %>%
            filter(!!var != "") %>%
            group_by(!!var) %>%
            summarise(n = n()) %>%
            rename(words = !!var)
    }
    
  2. We can then generate word counts for the positive and negative columns

    df.pos <- count_words(df, positive)
    df.neg <- count_words(df, negative)
    

    Let's inspect the data.frames

    df.pos
    # A tibble: 5 x 2
      words          n
      <chr>      <int>
    1 agree          3
    2 available      1
    3 better         1
    4 management     1
    5 strongly       2
    
    df.neg
    # A tibble: 3 x 2
      words        n
      <chr>    <int>
    1 concerns     1
    2 limits       1
    3 slow         1
    
  3. Let's plot the word clouds

    library(wordcloud)
    wordcloud(words = df.pos$words, freq = df.pos$n, min.freq = 1,
              max.words = 200, random.order = FALSE, rot.per = 0.35,
              colors = brewer.pal(8, "Dark2"))
    

    enter image description here

    wordcloud(words = df.neg$words, freq = df.neg$n, min.freq = 1,
              max.words = 200, random.order = FALSE, rot.per = 0.35,
              colors = brewer.pal(8, "Dark2"))
    

    enter image description here

Upvotes: 2

Related Questions