Reputation: 335
I have created a word2vec model and have made a visualization of the top n similar words for a particular term using TSNE and matplotlib. What I do not understand is that when I run it multiple times, the same words are plotted to different positions even though the words and vectors are the same each time. Why is this the case? I have a feeling it has to do with the way TSNE reduces the dimensionality of the vectors. If this is the case is it really reliable to use this method of visualization since it will be different every time?
model = Word2Vec.load("a_w2v_model")
topn_words_list = [x[0] for x in model.wv.most_similar("king",topn=3)]
topn_vectors_list = model[topn_words_list]
tsne = TSNE(n_components=2, verbose=1, perplexity=27, n_iter=300)
Y = tsne.fit_transform(topn_vectors_list)
fig, ax = plt.subplots()
ax.plot(Y[:, 0], Y[:, 1], 'o')
ax.set_yticklabels([]) #Hide ticks
ax.set_xticklabels([]) #Hide ticks
for i, word in enumerate(topn_words_list):
plt.annotate(word, xy=(Y[i, 0], Y[i, 1]))
plt.show()
Upvotes: 1
Views: 406
Reputation: 16966
As mentioned by @Nischal Sanil, T-SNE is a non-deterministic dimensionality reduction technique. That's why, there is a parameter called random_state
in TSNE
implementation of sklearn
.
Hence, to get the same result everytime, set the random_state
to some value.
Upvotes: 0
Reputation: 155
TSNE is a Non-deterministic dimensionality reduction technique. Hence on different runs with the same hyperparameters may produce different outputs, but are likely to be very similar. TSNE is a very popular dimensionality reduction technique due to its effective use of Non Linear Data and its capability to preserve Local and Global Structures, therfore making it very reliable. But these plots can be tricky and misleading to interpret without proper hyperparmenter tuning.
For more information on how to interpret TSNE plots, I highly recommend use read this post where the effective use of TSNE is explained brilliantly with interactive visualizations.
Upvotes: 1