Lewistrick
Lewistrick

Reputation: 2879

Sentiment wordcloud using R's quanteda?

I have a set of reviews (comment in words + rating from 0-10) and I want to create a sentiment word cloud in R, in which:

I used quanteda to create a dfm of the comments. Now I think I want to use the textplot_wordcloud function and I guess I need to do the following:

  1. For each word, get all the reviews it appeared in
  2. Calculate the average rating of this subset of reviews
  3. Divide by 10 to scale to 0-1 and assign this value to this word
  4. Sort the words by average rating (so that the colors are assigned correctly?)
  5. Use color=RColorBrewer::brewer.pal(11, "RdYlGn") to calculate colors from the average ratings

I'm having trouble with step 1 and 4. The rest should be doable. Can somebody explain how a dfm can be read manipulated easily?

Upvotes: 1

Views: 374

Answers (1)

Lewistrick
Lewistrick

Reputation: 2879

I found an efficient way to do this using matrix multiplication: basically the functionality is sw = sd * C / Nw, where:

  • sw = sentiment per word
  • sd = ratings per document
  • C = per-document word frequency matrix
  • Nw = number of occurences per word

In code:

# create the necessary variables
sd <- as.integer(df$rating)
C <- as.matrix(my_dfm)
Nw <- as.integer(colSums(C))

# calculate the word sentiment
sw <- as.integer(s_d %*% C) / n_w

# normalize the word sentiment to values between 0 and 1
sw <- (sw - min(sw)) / (max(sw) - min(sw)

# make a function that converts a sentiment value to a color
num_to_color <- seq_gradient_pal(low="#FF0000", high="#00FF00")

# apply the function to the sentiment values
word_colors <- num_to_color(sw)

# create a new window; 
# before executing the next command, manually maximize in order to get a better readable wordcloud
dev.new()

# create the wordcloud with the calculated color values
textplot_wordcloud(my_dfm, color=word_colors)

Upvotes: 2

Related Questions