Finding the size of a specific k-means cluster

Question

I've been having trouble with this for a while and I just cannot seem to find a way to get the number of data points within a specific cluster. Here's what I have so far:

This first chunk outputs the number of data points in each of my 8 clusters:

 def CountFrequency(my_list):  
    freq = {} 
    for item in my_list: 
        if (item in freq): 
            freq[item] += 1
        else: 
            freq[item] = 1

    for key, value in freq.items(): 
        print ("% d : % d"%(key, value)) 

def clusterCounts(df):

    df3 = df.fillna(df.mean())
    array3 = df3[['column1', 'column2', 'column3']].values
    kmeans = KMeans(n_clusters=8, random_state=42) 
    kmeans.fit(array3)
    return CountFrequency(kmeans.labels_)

Which results in:

(Not sure why the None is there but that's a minor issue I think)

My next code chunk prints the centroid for each of my 8 clusters:

def clusters(df):

    df3 = df.fillna(df.mean())
    array3 = df3[['column1', 'column2', 'column3']].values
    kmeans = KMeans(n_clusters=8, random_state=42) 
    kmeans.fit(array3)
    kmeans.labels_
    clusters = kmeans.cluster_centers_
    return clusters

Results in:

[[49.2  2.4 48.4]
 [18.9 18.9 62.1]
 [ 0.2  0.4 99.4]
 [ 1.1 98.3  0.6]
 [98.2  1.   0.9]
 [33.3 32.7 34. ]
 [27.   1.2 71.7]
 [ 3.6 51.9 44.5]]

I am trying to find a way to find out how many data points are in the cluster with the [33.3 32.7 34. ] centroid. How can I isolate this centroid's cluster in order to get the number of data points it contains? As a secondary question, do the keys in the first results code chunk I posted (the one with the # of data points per cluster) align with the order of the centroids above at all? I hope this is clear and thank you in advance!

Has QUIT--Anony-Mousse · Accepted Answer

Why don't you do a simple

for i in range(len(kmeans.cluster_centers)):
  print("Cluster", i)
  print("Center:", kmeans.cluster_centers_[i])
  print("Size:", sum(kmeans.labels_ == i))

Since TRUE will be a 1 and FALSE is 0.

Finding the size of a specific k-means cluster

Answers (1)

Related Questions