Reputation: 89
I have the output Z from a linkage algorithm in MATLAB.
The structure of the output Z is given in this link https://uk.mathworks.com/help/stats/linkage.html (scroll down to output)
I am trying to find the genealogy of the internal nodes and leaves of the dendrogram. The genealogy is defined as the ordered set of of internal nodes connecting leaf i (internal node α_h) to the root α_1. I also want to be able to find the parent node - this is the node w of v in which w immediately precedes v on the path from the root to v. Would anyone be willing to explain how I can do this using MATLAB?
In case my definitions weren't clear enough, an example case is shown in the image.
The genealogy of leaf 3 is G(3)={α_7, α_2, α_1} and the genealogy of internal node α_7 is G(α_7)={α_7, α_2, α_1}. An example for the parents: the parent node of α_7 is α_2, which I denote as g(α_2)=α_7. I'm aware the indexing of the tree given the output Z from the hierarchical clustering is different from the image, so a code that is consistent with the way the dendrogram is indexed by Z is absolutely fine. I just care that the output is correct for the dendrogram.
I hope now it's clearer as to what I want the code to find and do using the input of Z. Your help is much appreciated!
Upvotes: 1
Views: 577
Reputation: 1286
A small example makes things clearer:
X = [1,2,4,5,8]';
D = pdist(X);
L = linkage(D, 'ward');
dendrogram(L, 'labels', cellfun(@num2str, num2cell(X), 'uniform', false));
If you add a fourth column to L with
L(:, end+1) = (size(L, 1) + (1 : size(L, 1)));
then every row of L has the entries ["number of node", "number of node", "merge at level", "number of parent"].
So the root node is always L(end, end), the leave node have a number up to size(X, 1), the inner nodes have numbers greater than size(X, 1).
In your example L will approximately look like this:
L = [...
9, 10, 1, 10; % 10 = alpha_9
5, 6, 2, 11; % 11 = alpha_8
3, 4, 3, 12; % 12 = alpha_7
1, 2, 4, 13; % 13 = alpha_6
7, 12, 5, 14; % 14 = alpha_5
8, 15, 6, 15; % 15 = alpha_4
16, 11, 7, 16; % 16 = alpha_3
14, 13, 8, 17; % 17 = alpha_2
18, 17, 9, 18; % 18 = alpha_1
];
dendrogram(L);
Note that the labels are already correct by construction.
Upvotes: 0