Reputation: 2879
I have a set of reviews (comment in words + rating from 0-10) and I want to create a sentiment word cloud in R, in which:
I used quanteda to create a dfm
of the comments. Now I think I want to use the textplot_wordcloud
function and I guess I need to do the following:
color=RColorBrewer::brewer.pal(11, "RdYlGn")
to calculate colors from the average ratingsI'm having trouble with step 1 and 4. The rest should be doable. Can somebody explain how a dfm
can be read manipulated easily?
Upvotes: 1
Views: 374
Reputation: 2879
I found an efficient way to do this using matrix multiplication: basically the functionality is sw = sd * C / Nw
, where:
sw
= sentiment per wordsd
= ratings per documentC
= per-document word frequency matrixNw
= number of occurences per wordIn code:
# create the necessary variables
sd <- as.integer(df$rating)
C <- as.matrix(my_dfm)
Nw <- as.integer(colSums(C))
# calculate the word sentiment
sw <- as.integer(s_d %*% C) / n_w
# normalize the word sentiment to values between 0 and 1
sw <- (sw - min(sw)) / (max(sw) - min(sw)
# make a function that converts a sentiment value to a color
num_to_color <- seq_gradient_pal(low="#FF0000", high="#00FF00")
# apply the function to the sentiment values
word_colors <- num_to_color(sw)
# create a new window;
# before executing the next command, manually maximize in order to get a better readable wordcloud
dev.new()
# create the wordcloud with the calculated color values
textplot_wordcloud(my_dfm, color=word_colors)
Upvotes: 2