Reputation: 19
I have some data in a 1D array X
with 10 elements in it. I applied KMedoids
clustering on this data with 3 as a number of clusters. After applying the KMedoids
, I got cluster labels (id's) and centroids for each cluster.
from sklearn.metrics import silhouette_samples
from sklearn_extra.cluster import KMedoids
import pandas as pd
import numpy as np
X = np.array([0.85142858, 0.85566274, 0.85364912, 0.81536489, 0.84929932,
0.85042336, 0.84899714, 0.82019115, 0.86112067, 0.8312496 ])
X = X.reshape(-1, 1)
model1 = KMedoids(n_clusters=3, random_state=0).fit(X)
cluster_labels = model1.predict(X)
clusters, counts = np.unique(cluster_labels[cluster_labels>=0],
return_counts=True)
centroids = np.array(model1.cluster_centers_)
print("For centroids", centroids)
print("***************")
for i in range(len(X)):
print(i, X[i])
The results of this code is
For centroids [[0.85566274]
[0.85042336]
[0.82019115]]
***************
0 [0.85142858]
1 [0.85566274]
2 [0.85364912]
3 [0.81536489]
4 [0.84929932]
5 [0.85042336]
6 [0.84899714]
7 [0.82019115]
8 [0.86112067]
9 [0.8312496]
However, I want to display centroid with its datapoint.for example,
For centroids [[0.85566274] , 1 [0.85566274]
For centroids [0.85042336] , 5 [0.85042336]
For centroids [0.82019115]] , 7 [0.82019115]
How can I achieve this?
Upvotes: 1
Views: 1816
Reputation: 13723
You can print a table with labels, medoids and indices as columns like this:
import numpy as np
from sklearn_extra.cluster import KMedoids
X = np.array([[0.85142858],
[0.85566274],
[0.85364912],
[0.81536489],
[0.84929932],
[0.85042336],
[0.84899714],
[0.82019115],
[0.86112067],
[0.8312496 ]])
kmedoids = KMedoids(n_clusters=3, random_state=0).fit(X)
print('Label Medoid Index')
print('---------------------------')
for index in kmedoids.medoid_indices_:
label = kmedoids.labels_[index]
medoid = X[index]
print(f'{label:<7} {medoid} {index}')
Label Medoid Index
---------------------------
0 [0.85566274] 1
1 [0.85042336] 5
2 [0.82019115] 7
Alternatively, you could store the results in a pandas dataframe, as per your request:
import pandas as pd
df = pd.DataFrame({'label': kmedoids.labels_[kmedoids.medoid_indices_],
'medoid': np.squeeze(X[kmedoids.medoid_indices_]),
'index': kmedoids.medoid_indices_})
print(df)
label medoid index
0 0 0.855663 1
1 1 0.850423 5
2 2 0.820191 7
Upvotes: 3