man.utd_21
man.utd_21

Reputation: 19

Extracting centroids with its data point using K-Medoids clustering in Python?

I have some data in a 1D array X with 10 elements in it. I applied KMedoids clustering on this data with 3 as a number of clusters. After applying the KMedoids, I got cluster labels (id's) and centroids for each cluster.

from sklearn.metrics import silhouette_samples
from sklearn_extra.cluster import KMedoids
import pandas as pd
import numpy as np

X = np.array([0.85142858, 0.85566274, 0.85364912, 0.81536489, 0.84929932, 
              0.85042336, 0.84899714, 0.82019115, 0.86112067, 0.8312496 ])
X = X.reshape(-1, 1)

model1 = KMedoids(n_clusters=3, random_state=0).fit(X)
cluster_labels = model1.predict(X)  
clusters, counts = np.unique(cluster_labels[cluster_labels>=0], 
                             return_counts=True)
centroids = np.array(model1.cluster_centers_)

print("For centroids", centroids) 
print("***************")
for i in range(len(X)):
    print(i, X[i])

The results of this code is

For centroids [[0.85566274]
     [0.85042336]
     [0.82019115]]
    ***************
    0 [0.85142858]
    1 [0.85566274]
    2 [0.85364912]
    3 [0.81536489]
    4 [0.84929932]
    5 [0.85042336]
    6 [0.84899714]
    7 [0.82019115]
    8 [0.86112067]
    9 [0.8312496]

However, I want to display centroid with its datapoint.for example,

For centroids [[0.85566274] , 1 [0.85566274]
For centroids [0.85042336]  , 5 [0.85042336]
For centroids [0.82019115]] , 7 [0.82019115]

How can I achieve this?

Upvotes: 1

Views: 1816

Answers (1)

Tonechas
Tonechas

Reputation: 13723

You can print a table with labels, medoids and indices as columns like this:

import numpy as np
from sklearn_extra.cluster import KMedoids

X = np.array([[0.85142858],
              [0.85566274],
              [0.85364912],
              [0.81536489],
              [0.84929932],
              [0.85042336],
              [0.84899714],
              [0.82019115],
              [0.86112067],
              [0.8312496 ]])

kmedoids = KMedoids(n_clusters=3, random_state=0).fit(X)

print('Label   Medoid        Index')
print('---------------------------')
for index in kmedoids.medoid_indices_:
    label = kmedoids.labels_[index]
    medoid = X[index]
    print(f'{label:<7} {medoid}  {index}')

Output

Label   Medoid        Index
---------------------------
0       [0.85566274]  1
1       [0.85042336]  5
2       [0.82019115]  7

Alternatively, you could store the results in a pandas dataframe, as per your request:

import pandas as pd

df = pd.DataFrame({'label': kmedoids.labels_[kmedoids.medoid_indices_],
                   'medoid': np.squeeze(X[kmedoids.medoid_indices_]),
                   'index': kmedoids.medoid_indices_})
print(df)

Output

   label    medoid  index
0      0  0.855663      1
1      1  0.850423      5
2      2  0.820191      7

Upvotes: 3

Related Questions