Hierarchical agglomerative clustering: how to update distance matrix?

Question

I would like to implement the simple hierarchical agglomerative clustering according to the pseudocode:

I got stuck at the last part where I need to update the distance matrix. So far I have:

import numpy as np

X = np.array([[1, 2],
              [0, 3],
              [2, 3],])

# Clusters
C = np.zeros((X.shape[0], X.shape[0]))

# Keeps track of active clusters
I = np.zeros(X.shape[0])

# For all n datapoints
for n in range(X.shape[0]):
    for i in range(X.shape[0]):
        # Compute the similarity of all N x N pairs of images
        C[n][i] = np.linalg.norm(X[n] - X[i])
        I[n] = 1

# Collects clustering as a sequence of merges
A = []
In each of N iterations
for k in range(X.shape[0] - 1):
    # TODO: Find the indices of the smallest distance
    #  Updated distance matrix

I would like to implement the single-linkage clustering, so I would like to find the argmin of the distance matrix. I originally thought about doing something like:

i, m = np.where(C == np.min(C[np.nonzero(C)]))
    i, m = i[0], m[0]
    A.append((i, m))

to find the argmin, but I think it is incorrect as it doesn't specify a condition on the active clusters in I. I am also confused because I should just be looking at the upper or lower triangle of the matrix, so if I use the above method I could get the same argmin twice due to symmetry.

I was also thinking about first creating the rows and columns of the new merged cluster:

C = np.vstack((C, np.zeros((1, C.shape[1]))))
C = np.hstack((C, np.zeros((C.shape[0], 1))))

Then somehow update it like:

for j in range(X.shape[0]):
    C[i][j] = min(C[i][j], C[m][j])
    C[j][i] = min(C[i][j], C[m][j])

I am not sure if this is right approach. Is there a simpler way to find the argmin, merge the rows and columns and update the values?

Hierarchical agglomerative clustering: how to update distance matrix?

Answers (1)

Related Questions