崔箐坡
崔箐坡

Reputation: 77

How to get the scipy dendrogram leaf coordinates?

I plot a dendrogram using the code below.

from scipy.cluster.hierarchy import linkage, dendrogram, complete, to_tree
from scipy.spatial.distance import squareform
import numpy as np
import pandas as pd

diff_matrix = [[0, 0, 1, 0, 1],
               [0, 0, 1, 0, 1],
               [1, 1, 0, 1, 2],
               [0, 0, 1, 0, 1],
               [1, 1, 2, 1, 0]]

linkage_matrix = linkage(squareform(diff_matrix),'complete')
dendrogram_info = dendrogram(linkage_matrix, labels= list(df.index))

The generated dendrogram_info is a dict looks like:

{'icoord': [[35.0, 35.0, 45.0, 45.0],
  [25.0, 25.0, 40.0, 40.0],
  [15.0, 15.0, 32.5, 32.5],
  [5.0, 5.0, 23.75, 23.75]],
 'dcoord': [[0.0, 0.0, 0.0, 0.0],
  [0.0, 0.0, 0.0, 0.0],
  [0.0, 1.0, 1.0, 0.0],
  [0.0, 2.0, 2.0, 1.0]],
 'ivl': ['Sample05', 'Sample03', 'Sample04', 'Sample01', 'Sample02'],
 'leaves': [4, 2, 3, 0, 1],
 'color_list': ['C1', 'C1', 'C1', 'C0'],
 'leaves_color_list': ['C0', 'C1', 'C1', 'C1', 'C1']}

I want extract the point coordinate coressponding to the samples using the following code

leaf_coords = []
for x, y in zip(dendrogram_info['icoord'], dendrogram_info['dcoord']):
    if y[0]==0:
        leaf_coords.append([x[0], y[0]])
    if y[3]==0:
        leaf_coords.append([x[3], y[3]])

Here is the leaf_coords list, but I found there are two points [40.0, 0] and [32.5, 0] which should not be included. My question is how to get the points coordinates only belong to the samples or labels as the arrow indicated in the picture?

[[35.0, 0.0],
 [45.0, 0.0],
 [25.0, 0.0],
 [40.0, 0.0],
 [15.0, 0.0],
 [32.5, 0.0],
 [5.0, 0.0]]

Scipy hierarchy dendrogram plot

Upvotes: 0

Views: 68

Answers (0)

Related Questions