konstantin_doncov
konstantin_doncov

Reputation: 2879

Adding new points to the t-SNE model

I try to use t-SNE algorithm in the scikit-learn:

import numpy as np
from sklearn.manifold import TSNE
X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
model = TSNE(n_components=2, random_state=0)
np.set_printoptions(suppress=True)
model.fit_transform(X) 

Output:

array([[ 0.00017599,  0.00003993], #1
       [ 0.00009891,  0.00021913], 
       [ 0.00018554, -0.00009357],
       [ 0.00009528, -0.00001407]]) #2

After that I try to add some points with the coordinates exactly like in the first array X to the existing model:

Y = np.array([[0, 0, 0], [1, 1, 1]])
model.fit_transform(Y) 

Output:

array([[ 0.00017882,  0.00004002], #1
       [ 0.00009546,  0.00022409]]) #2

But coords in the second array not equal to the first and last coords from the first array.

I understand that this is the right behaviour, but how can I add new coords to the model and get the same coords in the output array for the same coords in the input array?

Also I still need to get closest points even after appending new points.

Upvotes: 8

Views: 3734

Answers (1)

Kilian Obermeier
Kilian Obermeier

Reputation: 7168

Quoting the author of t-SNE from here: https://lvdmaaten.github.io/tsne/

Once I have a t-SNE map, how can I embed incoming test points in that map?

t-SNE learns a non-parametric mapping, which means that it does not learn an explicit function that maps data from the input space to the map. Therefore, it is not possible to embed test points in an existing map (although you could re-run t-SNE on the full dataset). A potential approach to deal with this would be to train a multivariate regressor to predict the map location from the input data. Alternatively, you could also make such a regressor minimize the t-SNE loss directly, which is what I did in this paper.

Also, this answer on stats.stackexchange.com contains ideas and a link to

a very nice and very fast recent Python implementation of t-SNE https://github.com/pavlin-policar/openTSNE that allows embedding of new points out of the box

and a link to https://github.com/berenslab/rna-seq-tsne/.

Upvotes: 5

Related Questions