Reputation: 123
So I have successfully found out the optimal number of clusters required for kmeans algorithm in python, but now how can I find out the exact size of cluster that I get after applying the Kmeans in python?
Here's a code snippet
data=np.vstack(zip(simpleassetid_arr,simpleuidarr))
centroids,_ = kmeans(data,round(math.sqrt(len(uidarr)/2)))
idx,_ = vq(data,centroids)
initial = [cluster.vq.kmeans(data,i) for i in range(1,10)]
var=[var for (cent,var) in initial] #to determine the optimal number of k using elbow test
num_k=int(raw_input("Enter the number of clusters: "))
cent, var = initial[num_k-1]
assignment,cdist = cluster.vq.vq(data,cent)
Upvotes: 3
Views: 7503
Reputation: 21766
You can get the cluster size using this:
print np.bincount(idx)
For the the example below, np.bincount(idx)
outputs an array of two elements, e.g. [ 156 144]
from numpy import vstack,array
import numpy as np
from numpy.random import rand
from scipy.cluster.vq import kmeans,vq
# data generation
data = vstack((rand(150,2) + array([.5,.5]),rand(150,2)))
# computing K-Means with K = 2 (2 clusters)
centroids,_ = kmeans(data,2)
# assign each sample to a cluster
idx,_ = vq(data,centroids)
#Print number of elements per cluster
print np.bincount(idx)
Upvotes: 3