Kyv
Kyv

Reputation: 677

Dendrogram analysis of Hierarchical clustering algorithm

I am performing Hierarchical Clustering with python.

    from scipy.cluster.hierarchy import dendrogram, linkage
    from matplotlib import pyplot as plt

    linked = linkage(dataset, 'complete')

    labelList = list(range(len(dataset)))
    
    fig = plt.figure(figsize=(10, 7))
    fig.patch.set_facecolor('white')

    dendrogram(linked,
                orientation='top',
                labels=labelList,
                distance_sort='descending',
                show_leaf_counts=True)
    plt.show()

Here is the dendrogram I get.

hca dendrogram

There are two classes. I am now trying to get the indices of each class, while giving n_clusters=2 in the function AgglomerativeClustering.

from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(n_clusters=2, affinity='euclidean', linkage='ward')  
output = cluster.fit_predict(dataset)

output

array([0, 0, 0, 0, 0, 0, 1, 0, 1, 1])

These two classes are different from those in the dendrogram. I a currently manually notice the indices of the classes from the dendrogram.
Is there a way to do that automatically? Why does the function AgglomerativeClustering yield different results than the dendrogram?

EDIT: There must be a matching between the two functions dendrogram and AgglomerativeClustering.

Upvotes: 1

Views: 1890

Answers (1)

ASH
ASH

Reputation: 20302

You find the tallest line that is uncut by the horizontal line. I don't see your horizontal line here, but you have two clusters, represented as orange and green. Read the link below for more info.

https://ml2021.medium.com/clustering-with-python-hierarchical-clustering-a60688396945

Upvotes: 1

Related Questions