lena
lena

Reputation: 727

How to print the data of each cluster in agglomerative clustering algorithm in python

I'm new in machine learning tool in python, I write this code of agglomerative hierarchical clustering but I don't know if any way to print the data of each plot cluster. the input of algorithm is 5 numbers(0,1,2,3,4),In addition to drawing clusters, I need to print the value of each cluster separately something like this cluster1= [1,2,4] cluster2=[0,3]

update: I want to get the data that is drew and colored according to this line and other lines plt.scatter(points[y_hc==0,0], points[y_hc==0,1],s=100,c='cyan'), according to this code these numbers(1,2,4) are in one cluster and have the same color and (0,3) are in cluster2, so, I need to print these data(the data of each cluster) in terminal. this code is just drawing data.

import numpy as np 
import matplotlib.pyplot as plt 
from sklearn.datasets import make_blobs
dataset= make_blobs(n_samples=5, n_features=2,centers=4, cluster_std=1.6, random_state=50)
points= dataset[0]

import scipy.cluster.hierarchy as sch 
from sklearn.cluster import AgglomerativeClustering

dendrogram = sch.dendrogram(sch.linkage(points,method='ward'))
plt.scatter(dataset[0][:,0],dataset[0][:,1])
hc = AgglomerativeClustering(n_clusters=4, affinity='euclidean',linkage='ward')
y_hc= hc.fit_predict(points)
plt.scatter(points[y_hc==0,0], points[y_hc==0,1],s=100,c='cyan')
plt.scatter(points[y_hc==1,0], points[y_hc==1,1],s=100,c='yellow')
plt.scatter(points[y_hc==2,0], points[y_hc==2,1],s=100,c='red')
plt.scatter(points[y_hc==3,0], points[y_hc==3,1],s=100,c='green')
plt.show()

Upvotes: 0

Views: 4495

Answers (1)

Chris
Chris

Reputation: 1668

Having done some research it seems that there isn't an easy way to get the cluster labels from scipy's dendrogram function.

Below are a couple of options/workarounds.

Option One

Use scipy's linkage and fcluster functions to perform the clustering and get the labels:

Z = sch.linkage(points, 'ward') # Note 'ward' is specified here to match the linkage used in sch.dendrogram.
labels = sch.fcluster(Z, t=10, criterion='distance') # t chosen to return two clusters.

# Cluster 1
np.where(labels == 1)

Outputs: (array([0, 3]),)

# Cluster 2
np.where(labels == 2)

Outputs: (array([1, 2, 4]),)

Option Two

Modify your current use of sklearn to return two clusters:

hc = AgglomerativeClustering(n_clusters=2, affinity='euclidean',linkage='ward') # Again, 'ward' is specified here to match the linkage in sch.dendrogram.
y_hc = hc.fit_predict(points)

# Cluster 1
np.where(y_hc == 0)

Outputs: (array([0, 3]),)

# Cluster 2
np.where(y_hc == 1)

Outputs: (array([1, 2, 4]),)

Upvotes: 2

Related Questions