Reputation: 335

random points when visualizing word2vec embeddings using TSNE

I have created a word2vec model and have made a visualization of the top n similar words for a particular term using TSNE and matplotlib. What I do not understand is that when I run it multiple times, the same words are plotted to different positions even though the words and vectors are the same each time. Why is this the case? I have a feeling it has to do with the way TSNE reduces the dimensionality of the vectors. If this is the case is it really reliable to use this method of visualization since it will be different every time?

model = Word2Vec.load("a_w2v_model")

topn_words_list = [x[0] for x in model.wv.most_similar("king",topn=3)]
topn_vectors_list = model[topn_words_list]

tsne = TSNE(n_components=2, verbose=1, perplexity=27, n_iter=300)
Y = tsne.fit_transform(topn_vectors_list)

fig, ax = plt.subplots()
ax.plot(Y[:, 0], Y[:, 1], 'o')
ax.set_yticklabels([]) #Hide ticks
ax.set_xticklabels([]) #Hide ticks

for i, word in enumerate(topn_words_list):
    plt.annotate(word, xy=(Y[i, 0], Y[i, 1]))
plt.show()

Upvotes: 1

Answers (2)

Venkatachalam

Reputation: 16966

As mentioned by @Nischal Sanil, T-SNE is a non-deterministic dimensionality reduction technique. That's why, there is a parameter called random_state in TSNE implementation of sklearn.

Hence, to get the same result everytime, set the random_state to some value.

Upvotes: 0

Nischal Sanil

Reputation: 155

TSNE is a Non-deterministic dimensionality reduction technique. Hence on different runs with the same hyperparameters may produce different outputs, but are likely to be very similar. TSNE is a very popular dimensionality reduction technique due to its effective use of Non Linear Data and its capability to preserve Local and Global Structures, therfore making it very reliable. But these plots can be tricky and misleading to interpret without proper hyperparmenter tuning.

For more information on how to interpret TSNE plots, I highly recommend use read this post where the effective use of TSNE is explained brilliantly with interactive visualizations.

Upvotes: 1

random points when visualizing word2vec embeddings using TSNE

Answers (2)

Related Questions