Reputation: 77
I plot a dendrogram using the code below.
from scipy.cluster.hierarchy import linkage, dendrogram, complete, to_tree
from scipy.spatial.distance import squareform
import numpy as np
import pandas as pd
diff_matrix = [[0, 0, 1, 0, 1],
[0, 0, 1, 0, 1],
[1, 1, 0, 1, 2],
[0, 0, 1, 0, 1],
[1, 1, 2, 1, 0]]
linkage_matrix = linkage(squareform(diff_matrix),'complete')
dendrogram_info = dendrogram(linkage_matrix, labels= list(df.index))
The generated dendrogram_info is a dict looks like:
{'icoord': [[35.0, 35.0, 45.0, 45.0],
[25.0, 25.0, 40.0, 40.0],
[15.0, 15.0, 32.5, 32.5],
[5.0, 5.0, 23.75, 23.75]],
'dcoord': [[0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0],
[0.0, 1.0, 1.0, 0.0],
[0.0, 2.0, 2.0, 1.0]],
'ivl': ['Sample05', 'Sample03', 'Sample04', 'Sample01', 'Sample02'],
'leaves': [4, 2, 3, 0, 1],
'color_list': ['C1', 'C1', 'C1', 'C0'],
'leaves_color_list': ['C0', 'C1', 'C1', 'C1', 'C1']}
I want extract the point coordinate coressponding to the samples using the following code
leaf_coords = []
for x, y in zip(dendrogram_info['icoord'], dendrogram_info['dcoord']):
if y[0]==0:
leaf_coords.append([x[0], y[0]])
if y[3]==0:
leaf_coords.append([x[3], y[3]])
Here is the leaf_coords list, but I found there are two points [40.0, 0] and [32.5, 0] which should not be included. My question is how to get the points coordinates only belong to the samples or labels as the arrow indicated in the picture?
[[35.0, 0.0],
[45.0, 0.0],
[25.0, 0.0],
[40.0, 0.0],
[15.0, 0.0],
[32.5, 0.0],
[5.0, 0.0]]
Upvotes: 0
Views: 68