Reputation: 2061
I have a data table made from positive and negative word associations. I would like to create two wordclouds, one for positive words and one for negative words.
Example of sentiment_words
table:
element_id sentence_id negative positive
1115: 1 1115 limits agree,available
1116: 1 1116 slow strongly,agree
1117: 1 1117 management
1118: 1 1118
1119: 1 1119 concerns strongly,agree,better,
I am using library(wordcloud)
and library(sentimentr)
For example, how do I pull only the words from the "positive" column to create a wordcloud? I'm not sure how to address the fact that there are multiple words associated with each row (e.g., "agree, available" should be treated as two entries)
I've made different attempts at the wordcloud()
function such as
wordcloud(words = sentiment_words$positive, freq = 3, min.freq = 1, max.words = 200, random.order = FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2"))
but this only returns a cloud with the term in the first entry
Edit: I've tried the tidyverse
answer below, and the result I get is:
words n
<chr> <int>
1 " \"ability\"" 3
2 " \"ability\")" 1
3 " \"acceptable\")" 1
4 " \"accomplish\"" 1
5 " \"accomplished\")" 1
6 " \"accountability\"" 1
7 " \"accountability\")" 1
8 " \"accountable\"" 2
9 " \"accountable\")" 1
I've tried multiply variants of gsub()
and apply
to remove the extra )
and c(
but haven't found anything that works yet. The result is words that should be counted together are counted separately (e.g., "acceptable" and "acceptable)" are two different words in the wordcloud)
Edit: In order to get it to work correctly, I had to first clean up my sentiment_words
as suggested below.
for (j in seq(sentiment_words)) {
sentiment_words[[j]] <- gsub("character(0)", "", sentiment_words[[j]])
sentiment_words[[j]] <- gsub('"', "", sentiment_words[[j]])
sentiment_words[[j]] <- gsub("c\\(", "", sentiment_words[[j]])
sentiment_words[[j]] <- gsub(" ", "", sentiment_words[[j]])
sentiment_words[[j]] <- gsub("\\)", "", sentiment_words[[j]])
}
and I had to also filter out the remaining "character(0" strings within the count_words
function. Note that it filters "character(0" and not "character(0)" because I removed the closing parenthesis above
filter(!!var != "character(0") %>%
Implementing the above gave the cleanest wordcloud based on polarity of text
Upvotes: 1
Views: 1315
Reputation: 109894
I would strongly advise against using the accepted answer here as it ignores that sentimentr already returns the computed counts for you (via attributes(sentiment_words)$counts
). The documentation for extract_sentiment_terms
shows examples that makes this more clear (there's was room for improving the documentation about what is returned and has been added in the dev version: https://github.com/trinker/sentimentr/blob/master/R/extract_sentiment_terms.R). Below I show how to extract the counts for use in a wordcloud and some potential layouts:
library(sentimentr)
library(wordcloud)
library(data.table)
set.seed(10)
x <- get_sentences(sample(hu_liu_cannon_reviews[[2]], 1000, TRUE))
sentiment_words <- extract_sentiment_terms(x)
sentiment_counts <- attributes(sentiment_words)$counts
sentiment_counts[polarity > 0,]
par(mfrow = c(1, 3), mar = c(0, 0, 0, 0))
## Positive Words
with(
sentiment_counts[polarity > 0,],
wordcloud(words = words, freq = n, min.freq = 1,
max.words = 200, random.order = FALSE, rot.per = 0.35,
colors = brewer.pal(8, "Dark2"), scale = c(4.5, .75)
)
)
mtext("Positive Words", side = 3, padj = 5)
## Negative Words
with(
sentiment_counts[polarity < 0,],
wordcloud(words = words, freq = n, min.freq = 1,
max.words = 200, random.order = FALSE, rot.per = 0.35,
colors = brewer.pal(8, "Dark2"), scale = c(4.5, 1)
)
)
mtext("Negative Words", side = 3, padj = 5)
sentiment_counts[,
color := ifelse(polarity > 0, 'red',
ifelse(polarity < 0, 'blue', 'gray70')
)]
## Together
with(
sentiment_counts[polarity != 0,],
wordcloud(words = words, freq = n, min.freq = 1,
max.words = 200, random.order = FALSE, rot.per = 0.35,
colors = color, ordered.colors = TRUE, scale = c(5, .75)
)
)
mtext("Positive (red) & Negative (blue) Words", side = 3, padj = 5)
Upvotes: 0
Reputation: 50698
Here is a tidyverse
-based approach that should get you started. I agree with Mr_Z in that I'm not entirely clear on where the problem is.
Let's define a function that generates a data.frame
with the word count based on comma-separated words in a specific column var
of your source data df
.
library(tidyverse)
count_words <- function(df, var) {
var <- enquo(var)
df %>%
separate_rows(!!var, sep = ",") %>%
filter(!!var != "") %>%
group_by(!!var) %>%
summarise(n = n()) %>%
rename(words = !!var)
}
We can then generate word counts for the positive
and negative
columns
df.pos <- count_words(df, positive)
df.neg <- count_words(df, negative)
Let's inspect the data.frame
s
df.pos
# A tibble: 5 x 2
words n
<chr> <int>
1 agree 3
2 available 1
3 better 1
4 management 1
5 strongly 2
df.neg
# A tibble: 3 x 2
words n
<chr> <int>
1 concerns 1
2 limits 1
3 slow 1
Let's plot the word clouds
library(wordcloud)
wordcloud(words = df.pos$words, freq = df.pos$n, min.freq = 1,
max.words = 200, random.order = FALSE, rot.per = 0.35,
colors = brewer.pal(8, "Dark2"))
wordcloud(words = df.neg$words, freq = df.neg$n, min.freq = 1,
max.words = 200, random.order = FALSE, rot.per = 0.35,
colors = brewer.pal(8, "Dark2"))
Upvotes: 2