Ansh
Ansh

Reputation: 89

How to find the ancestor and parent nodes from a dendrogram in MATLAB

I have the output Z from a linkage algorithm in MATLAB.

The structure of the output Z is given in this link https://uk.mathworks.com/help/stats/linkage.html (scroll down to output)

I am trying to find the genealogy of the internal nodes and leaves of the dendrogram. The genealogy is defined as the ordered set of of internal nodes connecting leaf i (internal node α_h) to the root α_1. I also want to be able to find the parent node - this is the node w of v in which w immediately precedes v on the path from the root to v. Would anyone be willing to explain how I can do this using MATLAB?

In case my definitions weren't clear enough, an example case is shown in the image.

Example

The genealogy of leaf 3 is G(3)={α_7, α_2, α_1} and the genealogy of internal node α_7 is G(α_7)={α_7, α_2, α_1}. An example for the parents: the parent node of α_7 is α_2, which I denote as g(α_2)=α_7. I'm aware the indexing of the tree given the output Z from the hierarchical clustering is different from the image, so a code that is consistent with the way the dendrogram is indexed by Z is absolutely fine. I just care that the output is correct for the dendrogram.

I hope now it's clearer as to what I want the code to find and do using the input of Z. Your help is much appreciated!

Upvotes: 1

Views: 577

Answers (1)

JoergVanAken
JoergVanAken

Reputation: 1286

A small example makes things clearer:

X = [1,2,4,5,8]';    
D = pdist(X);
L = linkage(D, 'ward');
dendrogram(L, 'labels', cellfun(@num2str, num2cell(X), 'uniform', false));

If you add a fourth column to L with

L(:, end+1) = (size(L, 1) + (1 : size(L, 1)));

then every row of L has the entries ["number of node", "number of node", "merge at level", "number of parent"].

So the root node is always L(end, end), the leave node have a number up to size(X, 1), the inner nodes have numbers greater than size(X, 1).

In your example L will approximately look like this:

L = [...
    9, 10, 1, 10; % 10 = alpha_9
    5,  6, 2, 11; % 11 = alpha_8
    3,  4, 3, 12; % 12 = alpha_7
    1,  2, 4, 13; % 13 = alpha_6
    7, 12, 5, 14; % 14 = alpha_5
    8, 15, 6, 15; % 15 = alpha_4
   16, 11, 7, 16; % 16 = alpha_3
   14, 13, 8, 17; % 17 = alpha_2
   18, 17, 9, 18; % 18 = alpha_1
   ];
dendrogram(L);

Note that the labels are already correct by construction.

Upvotes: 0

Related Questions