Reputation: 809
I'm trying to plot the centroids and clusters of my k-means analysis, using the following code:
matrix_reduced = TruncatedSVD(n_components = num_k).fit_transform(matrix)
matrix_embedded = TSNE(n_components=2, perplexity=30,verbose=2, n_iter =500).fit_transform(matrix_reduced)
centroids = kmeans.cluster_centers_
centroids_embedded = TSNE(n_components=2).fit_transform(order_centroids)
fig = plt.figure(figsize=(10,10))
ax1 = fig.add_subplot(111)
ax1.scatter(matrix_embedded[:,0], matrix_embedded[:,1],marker='x',c = kmeans.labels_)
ax1.scatter(centroids_embedded[:,0], centroids_embedded[:,1],marker='o',c = 'red')
plt.show()
Unfortunately, the centroids are not centered at the different clusters:
Question: Does anyone know what could cause this? I have no idea what's going wrong.
Thanks!
Upvotes: 0
Views: 894
Reputation: 1725
In general, when creating any manifold you need to provide all the points you want to represent on it (as the final representation usually depends and all the points in your data).
In the example, you are creating two different manifolds:
matrix_reduced = TruncatedSVD(n_components = num_k).fit_transform(matrix)
# first manifold
matrix_embedded = TSNE(n_components=2, perplexity=30,verbose=2, n_iter =500).fit_transform(matrix_reduced)
centroids = kmeans.cluster_centers_
# second manifold
centroids_embedded = TSNE(n_components=2).fit_transform(order_centroids)
This means that the representations created are independent (and that's why you don't see them centered - they are, in fact, on a different space).
The way of fixing this is to simply join both matrix_reduced
and order_centroids
into a single dataset, and apply TSNE only once. That should show the result you are expecting.
Also, note that if you are using k-means on the original matrix (instead of on matrix_reduced
) then the result will still be incorrect - you need to apply the same transformations to both your centroids and the data that k-means saw originally.
So in summary (and assuming you want to use TruncatedSVD
before the clustering), it would work as follows:
TruncatedSVD
to transform the whole dataset at one.TSNE
to the whole dataset.Upvotes: 1