Kasi
Kasi

Reputation: 245

How to specify word in word embeddings in R

I have replicated the word embeddings code from Ben Schmidt's excellent tutorial on word2vec in R.

With a trained model, the code below finds ALL food terms in the corpus, and plots the terms close to the y-axis "salty" or x-axis "sweet".

tastes = model[[c("sweet","salty"),average=F]]
sweet_and_saltiness = model[1:3000,] %>% cosineSimilarity(tastes)    

plot(sweet_and_saltiness,type='n')
text(sweet_and_saltiness,labels=rownames(sweet_and_saltiness))

This works great, but how can I specify the food words I want to plot? Let's say I only care about "salmon" and "tuna" and want to plot those two only?

I tried to filter it out (sweet_and_saltiness = sweet_and_saltiness[c("salmon","tuna")] but it didn't work.

My apologies for not providing a reproducible example, I'm not sure how I can do so as I'm using a trained model etc.

I found a similar question here on SO but it's for Python, not R.

Edit:

sweet_and_saltiness is a matrix that contains many terms. Below are some of them:

structure(c(0.528436401795892, 1, 0.563471203034216, 0.502205073864983, 
0.0589914271300194, -0.0237981616846065, -0.0657883169365425, 
-0.0558463233095463, 0.15991116770716, 0.13954111689771, 0.064859364561648, 
0.0109053881576116, 0.387838863143423, 0.366834478524629, 0.349148925405899, 
0.338632667643554), .Dim = c(8L, 2L), .Dimnames = list(c("very", 
"sweet", "red", "rich", "olive_oil", "if_necessary", "until_done", 
"15_minutes"), c("sweet", "salty")))

The figures are coordinates in the plot, for all the terms (red, rich, olive_oil, etc.). My question is, how can I exclude ALL words from the plot, and focus on the words I'm interested in? (Assuming the words are in the sweet_and_saltiness matrix.

Upvotes: 1

Views: 197

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76402

An option with ggplot2 graphics could be the following.
Before plotting, with ggplot2 graphics the format should be a data.frame in the long format and the data is a matrix in wide format. See this post on how to reshape the data from wide to long format.

library(dplyr)
library(tidyr)
library(ggplot2)

wanted_fish <- c("red", "until_done")

sweet_and_saltiness %>%
  as.data.frame() %>%
  mutate(fish = rownames(.)) %>%
  pivot_longer(-fish) %>%
  filter(fish %in% wanted_fish) %>%
  ggplot(aes(name, value)) +
  geom_text(aes(label = fish)) +
  theme_bw()

enter image description here

Upvotes: 1

Related Questions