Reputation: 35696
According to this we can get labels for non-singleton clusters.
I tried this with a simple example.
import numpy as np
import scipy.cluster.hierarchy
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
mat = np.array([[ 0. , 1. , 3. ,0. ,2. ,3. ,1.],
[ 1. , 0. , 3. , 1., 1. , 2. , 2.],
[ 3., 3. , 0., 3. , 3., 3. , 4.],
[ 0. , 1. , 3., 0. , 2. , 3., 1.],
[ 2. , 1., 3. , 2., 0. , 1., 3.],
[ 3. , 2., 3. , 3. , 1. , 0. , 3.],
[ 1. , 2., 4. , 1. , 3., 3. , 0.]])
def llf(id):
if id < n:
return str(id)
else:
return '[%d %d %1.2f]' % (id, count, R[n-id,3])
linkage_matrix = linkage(mat, "complete")
dendrogram(linkage_matrix,
p=4,
leaf_label_func=llf,
color_threshold=1,
truncate_mode='lastp',
distance_sort='ascending')
plt.show()
What are n, and count here?In a diagram like following I need to know who are listed under(3) and (2)?
Upvotes: 0
Views: 1059
Reputation: 54340
I think the document is not very clear at this part and the sample code in it is not even operational. But it is clear that 1 means the 2nd observation and (3) means there are 3 observation in that node.
If you want to know what are the 3 obs. in the 2nd node, if that is your question:
In [51]:
D4=dendrogram(linkage_matrix,
color_threshold=1,
p=4,
truncate_mode='lastp',
distance_sort='ascending')
D7=dendrogram(linkage_matrix,
color_list=['g',]*7,
p=7,
truncate_mode='lastp',
distance_sort='ascending', no_plot=True)
from itertools import groupby
[list(group) for key, group in groupby(D7['ivl'],lambda x: x in D4['ivl'])]
Out[51]:
[['1'], ['6', '0', '3'], ['2'], ['4', '5']]
The 2nd node contains obs. 7th, 1th and 4th, and the 2th node contains the 5th and the 6th observations.
Upvotes: 1