Reputation: 127
Xi=[[0,5,10,8,3],[5,0,1,3,2],[10,1,0,5,1],[8,3,5,0,6],[3,2,1,6,0]]
Xi = Distance matrix
shc.fcluster(shc.linkage(Xi,'complete'),9,criterion='distance')
in this code threshold = 9
after clustering result is array([3, 1, 1, 2, 1], dtype=int32)
i don't understand why not array [2 ,1 ,1, 1, 1]
this image means after clustering https://drive.google.com/file/d/17806FuPuNpJiqhT12jiuFOMGNUvB1vjT/view?usp=sharing
Upvotes: 1
Views: 2044
Reputation: 2249
import numpy as np
import pandas as pd
from scipy.cluster.hierarchy import linkage, dendrogram, fcluster
from scipy.spatial.distance import pdist
import matplotlib.pyplot as plt
import seaborn as sns
You have this distance matrix
Xi = np.array([[0,5,10,8,3],[5,0,1,3,2],[10,1,0,5,1],[8,3,5,0,6],[3,2,1,6,0]])
we can visualize as
df = pd.DataFrame(Xi)
# fill NaNs and mask 0s
df.fillna(0, inplace=True)
mask = np.zeros_like(df)
mask[np.triu_indices_from(mask)] = True
sns.heatmap(df, annot=True, fmt='.0f', cmap="YlGnBu", mask=mask);
Now, we get the pdist
p = pdist(Xi)
and the linkage
Z = linkage(p, method='complete')
You set 9
as threshold so
dendrogram(Z)
plt.axhline(9, color='k', ls='--');
you have 3 clusters
fcluster(Z, 9, criterion='distance')
array([3, 1, 1, 2, 1], dtype=int32)
# 0 1 2 3 4 <- elements
and it's correct, you can verify with the dendrogram that
1
, 2
and 4
in cluster 1
3
in cluster 2
0
in cluster 3
If you want two cluster only, you have to choose 12
, for example, as thershold
dendrogram(Z)
plt.axhline(12, color='k', ls='--');
and so you have your expected result
fcluster(Z, 12, criterion='distance')
array([2, 1, 1, 1, 1], dtype=int32)
# 0 1 2 3 4 <- elements
Upvotes: 3