Is there a way to assess similarity between TSNE plots in python?

Question

I wanted to know if there was a way to assess the similarity between or is this not possible because of the random nature of how these plots are generated? I did some research and found the Hungarian assignment algorithm to map points from one plot to another but not sure how to apply it/if this is the right approach.

Also is there a way to plot two sets of data under the same TSNE conditions - ie if I have a df with two sets of data concatenated into one and then somehow separate this in the end to show to distinct plots but under the same x,y embedding space?

ach · Accepted Answer

If they're the same points that are being embedded in both plots - one option might be to label each point with its nearest group in each tsne embedding/plot, which you could do by running a clustering/k-nearest neighbours algorithm on the embeddings. After you have these two sets of labels for the points, you could use the adjusted rand score to compare the similarity of these labellings, i.e. determine whether points were assigned to similar groups/clusters between the two tsne embeddings.
Apologies in advance if I haven't fully understood your second question - I guess it depends on what you really want to show by doing that. If you run tsne on the overall dataframe, you could "separate" them by either colouring the two datasets differently on the plot. The result of these plots/embeddings would obviously be different if you performed tsne on each dataframe separately - you can hold the parameters and set the random seed to be the same, but due to the non deterministic nature of tsne, I don't think its very meaningful to compare the "x,y embedding space" of the two runs.

Is there a way to assess similarity between TSNE plots in python?

Answers (1)

Related Questions