Reputation: 187
I have a list of ~20k word vectors ('tuple_vectors'), with no labels, each looks like the below
[-2.84658718e+00 -7.74899840e-01 -2.24296474e+00 -8.69364500e-01
3.90927410e+00 -2.65316987e+00 -9.71897244e-01 -2.40408254e+00
1.16272974e+00 -2.61649752e+00 -2.87350488e+00 -1.06603658e+00
2.93374014e+00 1.07194626e+00 -1.86619771e+00 1.88549474e-01
-1.31901133e+00 3.83382154e+00 -3.46174908e+00 ...
is there a quick, concise way to visualise using t-sne?
I've tried with the following
from sklearn.manifold import TSNE
n_sne = 21060
tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(tuple_vectors)
plt(tsne_results)
Upvotes: 3
Views: 1789
Reputation: 3235
If you are vectorizing your text first, I suggest using yellowbrick
library. Since TSNE is very expensive, TSNEVisualizer
in yellowbrick
applies a simpler decomposition ahead of time (SVD with 50 components by default), then performs the t-SNE embedding. The visualizer then plots the scatter plot which can be colored by cluster or by class. Here is a simple example using tf-idfvectorizer:
from yellowbrick.text import TSNEVisualizer
from sklearn.feature_extraction.text import TfidfVectorizer
# vectorize the text
tfidf = TfidfVectorizer()
tuple_vectors = tfidf.fit_transform(sample_text)
# Create the visualizer and draw the vectors
tsne = TSNEVisualizer()
tsne.fit(tuple_vectors)
tsne.poof()
Upvotes: 5