Reputation: 677
I am performing Hierarchical Clustering with python.
from scipy.cluster.hierarchy import dendrogram, linkage
from matplotlib import pyplot as plt
linked = linkage(dataset, 'complete')
labelList = list(range(len(dataset)))
fig = plt.figure(figsize=(10, 7))
fig.patch.set_facecolor('white')
dendrogram(linked,
orientation='top',
labels=labelList,
distance_sort='descending',
show_leaf_counts=True)
plt.show()
Here is the dendrogram I get.
There are two classes. I am now trying to get the indices of each class, while giving n_clusters=2
in the function AgglomerativeClustering
.
from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(n_clusters=2, affinity='euclidean', linkage='ward')
output = cluster.fit_predict(dataset)
output
array([0, 0, 0, 0, 0, 0, 1, 0, 1, 1])
These two classes are different from those in the dendrogram. I a currently manually notice the indices of the classes from the dendrogram.
Is there a way to do that automatically?
Why does the function AgglomerativeClustering
yield different results than the dendrogram?
EDIT: There must be a matching between the two functions dendrogram and AgglomerativeClustering.
Upvotes: 1
Views: 1890
Reputation: 20302
You find the tallest line that is uncut by the horizontal line. I don't see your horizontal line here, but you have two clusters, represented as orange and green. Read the link below for more info.
https://ml2021.medium.com/clustering-with-python-hierarchical-clustering-a60688396945
Upvotes: 1