Reputation: 7245
I have two observations of the same event. Let say X
and Y
.
I suppose to have nc
clusters. I am using sklearn
to make the clustering.
x = KMeans(n_clusters=nc).fit_predict(X)
y = KMeans(n_clusters=nc).fit_predict(Y)
is there a measure that allow me to compare x
and y
: i.e. this measure will be 1
if the clusters x
and y
are the same.
Upvotes: 0
Views: 2948
Reputation: 453
The Rand Index and its adjusted version do this exactly. Two cluster assignments that match (even if the labels themselves, which are treated as arbitrary, are different), get a score of 1. A value of 0 means they don't agree at all. The Adjusted Rand Index uses its baseline as random assignment of points to clusters.
Upvotes: 1
Reputation: 33522
Just extract the cluster centers of your kmeans-objects (see the docs):
x_centers = x.cluster_centers_
y_centers = y.cluster_centers_
The you have to decide which metric you are using to compare these. Keep in mind that the centers are floating-points, the clustering-process is a heuristic and the clustering-process is a random-algorithm. This means, you will get something which interprets as not exactly the same with a high probability, even for cluster-objects trained on the same data.
This link discusses some approaches and the problems.
Upvotes: 2